|
Home > Archive > MySQL ODBC Connector > December 2005 > merge table: speed benefits?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
merge table: speed benefits?
|
|
| Tom Cunningham 2005-12-21, 3:23 am |
| Hi all.
I'm working on splitting a 5G myisam fact table into separate parts,
and putting a merge table on top.
his will definitely help with table management, but I am hoping that a
merge table might help me with query speed:
(a) The docs say that spreading the underlying tables across different
disks can make queries faster. I don't quite understand how this will
work in a normal query: if I do a SUM(amount) over the entire table,
will it be quicker if the table is spread across different disks? I
don't see how mysql can avoid doing this sequentially.
(b) I also thought that if I do a series of queries, via the merge
table, which all go to a single underlying table, then that would be
quicker than the same queries to the original monster table: because
the index and data caches would just be cacheing for a single table.
<i>is this true?</i>. On reflection I thought that the index and data
caches only cache *parts* of tables, not entire tables, so it will
make no difference.
(c) Finally: in theory the optimiser could use two indexes: use index
A to decide which tables to look at, then use index B within those
tables to find the rows that meet another condition. This would be an
advantage over a non-merged table. But I don't think Mysql can do
this, can it?
So should I expect any speed benefits to splitting my fact table?
Tom.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw
| |
| Gleb Paharenko 2005-12-21, 9:23 am |
| Hello.
> (a) The docs say that spreading the underlying tables across different
> disks can make queries faster. I don't quite understand how this will
> work in a normal query: if I do a SUM(amount) over the entire table,
> will it be quicker if the table is spread across different disks? I
> don't see how mysql can avoid doing this sequentially.
IMHO, you should get speed benefits in concurrent user environment, when
several queries are running at once. In the same time, they might be
reading data from the different disks, and it is much faster than
reading from one disk.
> (b) I also thought that if I do a series of queries, via the merge
> table, which all go to a single underlying table, then that would be
> quicker than the same queries to the original monster table: because
> the index and data caches would just be cacheing for a single table.
> <i>is this true?</i>. On reflection I thought that the index and data
> caches only cache *parts* of tables, not entire tables, so it will
> make no difference.
Manual says that you'll get more speed when you do your searches in
one of the underlying tables directly if you know that the data is
stored there. If you have enough memory so the whole index is stored
in key_buffer you shouldn't get big differences in speed between MERGE
and "monster" tables, but I'm not sure about the case when you have
limited amount of memory, and MySQL often performs key reads, it seems
from the manual that MERGE should be slower.
> (c) Finally: in theory the optimiser could use two indexes: use index
> A to decide which tables to look at, then use index B within those
> tables to find the rows that meet another condition. This would be an
> advantage over a non-merged table. But I don't think Mysql can do
> this, can it?
As far as I know MySQL can't do this. BTW, partitioning feature which
is present in MySQL 5.1 is that you're looking for.
Tom Cunningham wrote:
> Hi all.
>
> I'm working on splitting a 5G myisam fact table into separate parts,
> and putting a merge table on top.
>
> his will definitely help with table management, but I am hoping that a
> merge table might help me with query speed:
>
> (a) The docs say that spreading the underlying tables across different
> disks can make queries faster. I don't quite understand how this will
> work in a normal query: if I do a SUM(amount) over the entire table,
> will it be quicker if the table is spread across different disks? I
> don't see how mysql can avoid doing this sequentially.
>
> (b) I also thought that if I do a series of queries, via the merge
> table, which all go to a single underlying table, then that would be
> quicker than the same queries to the original monster table: because
> the index and data caches would just be cacheing for a single table.
> <i>is this true?</i>. On reflection I thought that the index and data
> caches only cache *parts* of tables, not entire tables, so it will
> make no difference.
>
> (c) Finally: in theory the optimiser could use two indexes: use index
> A to decide which tables to look at, then use index B within those
> tables to find the rows that meet another condition. This would be an
> advantage over a non-merged table. But I don't think Mysql can do
> this, can it?
>
> So should I expect any speed benefits to splitting my fact table?
>
> Tom.
>
--
For technical support contracts, goto https://order.mysql.com/?ref=ensita
This email is sponsored by Ensita.NET http://www.ensita.net/
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Gleb Paharenko
/ /|_/ / // /\ \/ /_/ / /__ Gleb.Paharenko@ensita.net
/_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.NET
<___/ www.mysql.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw
| |
| Tom Cunningham 2005-12-21, 8:24 pm |
| OK thanks for your help, so my summary:
(1) spreading merged tables across disks will only help concurrent
queries (in my data-warehouse application I'm doing things serially).
(2) there's no efficiency in the way a merge table splits the indexes
into smaller files -- if anything, it will be less efficient. *Unless*
you bypass the merge table and go directly to the underlying table,
which would require some client-side logic.
(3) mysql 5.1 has data partitioning, but that's at least 6 months away?
So: my original problem was trying to decide whether to split into
month-chunks or week-chunks. I think I will choose month-chunks, which
will make data management easier, and there are no performance gains
to be expected from going with week-chunks.
Thanks.
Tom.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw
| |
| beacker@misc.com 2005-12-23, 8:23 pm |
| Tom Cunningham writes:
>(a) The docs say that spreading the underlying tables across different
>disks can make queries faster. I don't quite understand how this will
>work in a normal query: if I do a SUM(amount) over the entire table,
>will it be quicker if the table is spread across different disks? I
>don't see how mysql can avoid doing this sequentially.
Tom,
Multiple disks definitely helps, since in my case it allows me
to have N times the available read speed and N times the available
number iops (IOs/sec). In my case I have a 61GB merge table that
is based upon 180 separate myisam tables. This table contains 487M
records. The kinds of queries I'm doing, I would not be able to
handle cleanly without merge tables on 4.1.15.
Since trying to back this up would be a nightmare, I rely upon
a standby master, and 3 query slaves for this data. This way things
work reasonably, and I have some hope of surviving some kind of
hardware failure. This is just one small portion of the substantial
data set I'm responsible for over at Technorati.
Brad Eacker (beacker@misc.com)
beacker@technorati.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw
| |
| Tom Cunningham 2005-12-25, 8:23 pm |
| Thanks for the reply Brad, two questions about your situtation:
(1) The speedup only applies because of concurrent queries, right?
With sequential queries you would expect no speedup?
(2) If you have 180 underlying tables, then each query to the merge
table has to do 180 separate index lookups, right?
Tom.
On 24 Dec 2005 02:08:12 -0000, beacker@misc.com <beacker@misc.com> wrote:
> Tom Cunningham writes:
>
> Tom,
> Multiple disks definitely helps, since in my case it allows me
> to have N times the available read speed and N times the available
> number iops (IOs/sec). In my case I have a 61GB merge table that
> is based upon 180 separate myisam tables. This table contains 487M
> records. The kinds of queries I'm doing, I would not be able to
> handle cleanly without merge tables on 4.1.15.
>
> Since trying to back this up would be a nightmare, I rely upon
> a standby master, and 3 query slaves for this data. This way things
> work reasonably, and I have some hope of surviving some kind of
> hardware failure. This is just one small portion of the substantial
> data set I'm responsible for over at Technorati.
> Brad Eacker (beacker@misc.com)
> beacker@technorati.com
>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw
|
|
|
|
|