|
Home > Archive > Slony1 PostgreSQL Replication > April 2006 > primary key for slony
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
primary key for slony
|
|
| Miguel 2006-04-04, 11:33 am |
| hi, this a design database question , i have a detail table that only
have 4 columns, i want ot replicate it with slony:
CREATE TABLE horarios_general
(
idplan int4 NOT NULL,
idhorario int4 NOT NULL,
dia_inicio int4 NOT NULL,
dia_fin int4 NOT NULL,
hora_inicio time NOT NULL,
hora_fin time NOT NULL
)
it doesnt have a primary key, would it be a problem if the key is
confirmed of all the table's columns?, is that recommended?
---
miguel
| |
| Scott Marlowe 2006-04-04, 11:33 am |
| On Tue, 2006-04-04 at 11:08, Miguel wrote:
> hi, this a design database question , i have a detail table that only
> have 4 columns, i want ot replicate it with slony:
>
> CREATE TABLE horarios_general
> (
> idplan int4 NOT NULL,
> idhorario int4 NOT NULL,
> dia_inicio int4 NOT NULL,
> dia_fin int4 NOT NULL,
> hora_inicio time NOT NULL,
> hora_fin time NOT NULL
> )
>
> it doesnt have a primary key, would it be a problem if the key is
> confirmed of all the table's columns?, is that recommended?
You can either create a column with a serial type and make it the
primary key (i.e. an "artificial" primary key) or use all the columns.
The problem with using all the fields as a primary key is performance.
The single column index for an int / int8 is gonna be smaller and
cheaper to update. OTOH, if you really should have a unique
multi-column index on this table for the sake of your data, then you
might as well make it and use it.
Either one will make slony happy.
| |
| Christopher Browne 2006-04-04, 1:32 pm |
| Miguel <mmiranda- nEhLOg9Nx0turzwRxvvF
aw@public.gmane.org> writes:
> hi, this a design database question , i have a detail table that only
> have 4 columns, i want ot replicate it with slony:
>
> CREATE TABLE horarios_general
> (
> idplan int4 NOT NULL,
> idhorario int4 NOT NULL,
> dia_inicio int4 NOT NULL,
> dia_fin int4 NOT NULL,
> hora_inicio time NOT NULL,
> hora_fin time NOT NULL
> )
>
> it doesnt have a primary key, would it be a problem if the key is
> confirmed of all the table's columns?, is that recommended?
You MUST have a candidate for primary key.
It tends to be most efficient if the PK has fewer columns rather than
more, but really, your decision should be based on what the best sort
of primary key should be for the table.
--
output = ("cbbrowne" "@" "ca.afilias.info")
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
| |
| Miguel 2006-04-04, 1:32 pm |
| Christopher Browne wrote:
>Miguel <mmiranda- nEhLOg9Nx0turzwRxvvF
aw@public.gmane.org> writes:
>
>
>
>
>You MUST have a candidate for primary key.
>
>It tends to be most efficient if the PK has fewer columns rather than
>more, but really, your decision should be based on what the best sort
>of primary key should be for the table.
>
>
Ok, the easiest way is create a secuence, but i really need that all the
columns have unique values, so i will go with the multiple column key,
only 6 columns should not be a problem to slony , should it? :-)
thanks
| |
| Andrew Sullivan 2006-04-04, 8:29 pm |
| On Tue, Apr 04, 2006 at 11:56:29AM -0600, Miguel wrote:
> Ok, the easiest way is create a secuence, but i really need that all the
> columns have unique values, so i will go with the multiple column key,
> only 6 columns should not be a problem to slony , should it? :-)
It oughta work. You might find that performance will improve,
however, if you _also_ add an artificial key.
A
--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
--Dennis Ritchie
| |
| Miguel 2006-04-04, 8:29 pm |
| Andrew Sullivan wrote:
>On Tue, Apr 04, 2006 at 11:56:29AM -0600, Miguel wrote:
>
>
>
>It oughta work. You might find that performance will improve,
>however, if you _also_ add an artificial key.
>
>A
>
>
>
what do you mean by artificial?, the secuence or specify a candidate key
with
SET ADD TABLE
| |
| Jan Wieck 2006-04-04, 8:29 pm |
| On 4/4/2006 4:06 PM, Miguel wrote:
> Andrew Sullivan wrote:
>
> what do you mean by artificial?, the secuence or specify a candidate key
> with
He means the extra serial column.
The difference it makes for slony is that the commands for replicating
UPDATE and DELETE operations will read
... WHERE "col1" = 'val1' AND "col2" = 'val2' ... "col6" = 'val6'
causing an index lookup with a 6-column scankey instead of a single
int4/int8 column one. Sure, that scan will be more expensive. But I
think Andrew forgot that using an artificial serial column means that
your origin and subscribers must maintain an additional index on all
insert/update operations (and vacuum must clean it up after deletes).
Andrew, do you have actual experience that it will perform better, or
was that an assumption?
Jan
--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
| |
| Andrew Sullivan 2006-04-04, 8:29 pm |
| On Tue, Apr 04, 2006 at 04:16:12PM -0400, Jan Wieck wrote:
> int4/int8 column one. Sure, that scan will be more expensive. But I
> think Andrew forgot that using an artificial serial column means that
> your origin and subscribers must maintain an additional index on all
> insert/update operations (and vacuum must clean it up after deletes).
>
> Andrew, do you have actual experience that it will perform better, or
> was that an assumption?
No, I didn't forget; but I probably should have made this clearer.
The real question is what the use patterns are.
For cases where you're likely to change _all_ the columns at once
(think INSERT only, for instance), the overhead for the additional
index is likely not worth it.
For cases where you insert once, but then some (but not all) of the
values change over time, then depending on your hardware, use
patterns, and network speeds, the single column might be worth it. A
six-column key where 5 the items are longish pieces of text and one
is a status value, for instance, would be not only expensive to look
up: it will also be expensive to ship on the network. So if the only
thing that normally changes on the table is the status value, the
single-column serial number (== artificial key) might make sense,
provided that you have the processor and disk bandwidth to keep up
with the additional index maintenance and vacuuming.
You have to evaluate these cases one at a time.
A
--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun
| |
| Scott Marlowe 2006-04-04, 8:29 pm |
| On Tue, 2006-04-04 at 15:32, Andrew Sullivan wrote:
> On Tue, Apr 04, 2006 at 04:16:12PM -0400, Jan Wieck wrote:
>
> No, I didn't forget; but I probably should have made this clearer.
> The real question is what the use patterns are.
>
> For cases where you're likely to change _all_ the columns at once
> (think INSERT only, for instance), the overhead for the additional
> index is likely not worth it.
>
> For cases where you insert once, but then some (but not all) of the
> values change over time, then depending on your hardware, use
> patterns, and network speeds, the single column might be worth it. A
> six-column key where 5 the items are longish pieces of text and one
> is a status value, for instance, would be not only expensive to look
> up: it will also be expensive to ship on the network.
Also, if the text values are large enough, they'll overrun the 1/3 of a
block size limit and your insert / update will fail.
| |
| Christopher Browne 2006-04-04, 8:29 pm |
| Scott Marlowe <smarlowe- yvqjf0D4ItFqO0yjgLDq
5wC/G2K4zDHf@public.gmane.org> writes:
> On Tue, 2006-04-04 at 15:32, Andrew Sullivan wrote:
>
> Also, if the text values are large enough, they'll overrun the 1/3 of a
> block size limit and your insert / update will fail.
We're getting very excited about the issues surrounding how this works
in the most general, pathological cases.
Of course, Miguel showed us an example which involved a table
consisting of 4 int4 values and two time values. Which sounds to me
like something where tuples will consist of somewhere around 25-30
bytes of data, so that the index entries oughtn't cause any massive
problems.
If horarios_general is a table of millions of entries, where thousands
get updated each hour, there is sense in agonizing over whether or not
to add a serial column as a surrogate primary key.
If, on the other hand, it's going to have a few rows modified per day,
it isn't worth thinking too hard about it, and the difference between
a single serial PK and a composite of his six columns will be
immaterial.
--
let name="cbbrowne" and tld="ca.afilias.info" in String.concat "@" [name;tld];;
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
| |
| Andrew Sullivan 2006-04-04, 8:29 pm |
| On Tue, Apr 04, 2006 at 05:07:55PM -0400, Christopher Browne wrote:
>
> If horarios_general is a table of millions of entries, where thousands
> get updated each hour, there is sense in agonizing over whether or not
> to add a serial column as a surrogate primary key.
>
> If, on the other hand, it's going to have a few rows modified per day,
> it isn't worth thinking too hard about it, and the difference between
> a single serial PK and a composite of his six columns will be
> immaterial.
Right. Which is what I guess I said, tersely, the first time: you
might find it'll be faster. It's something worth testing, under some
circumstances, if you understand what the consequences are.
The important thing to realise here is that this is a function of
several variables:
1. The cost of (additional) indexes and fields.
2. The cost of a lookup in a multi-column index (particularly if
the early-listed columns have poor selectivity, in my
experience).
3. The cost of shipping the index criterion on the network.
Many people may not realise the import of (3): when you update or
delete a row to be replicated by Slony, what Slony actually sends on
the wire is a query altering the relevant columns (or deleting the
row) and selecting by the primary key. If you have widely
distributed nodes with lots of fast disk, but very slow networks
between them, this could be a factor that comes into play. It's not
likely, I expect, but it's worth considering in your thinking.
In most cases, the multi-column key is an excellent choice.
A
--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin
| |
| Miguel 2006-04-04, 8:29 pm |
| Christopher Browne wrote:
>If horarios_general is a table of millions of entries, where thousands
>get updated each hour,
>
i dont think so...
>there is sense in agonizing over whether or not
>to add a serial column as a surrogate primary key.
>If, on the other hand, it's going to have a few rows modified per day,
>it isn't worth thinking too hard about it, and the difference between
>a single serial PK and a composite of his six columns will be
>immaterial.
>
>
Im pretty sure this will be the case, in fact, i will be inserting few
rows per week,
its good to know the details though.
thanks
---
miguel
| |
| Jan Wieck 2006-04-05, 3:30 am |
| On 4/4/2006 5:19 PM, Andrew Sullivan wrote:
> In most cases, the multi-column key is an excellent choice.
I so much agree with all of it. Usually, the existing primary key or any
business process implied unique-not-null column set will be an excellent
choice.
Hint: If your business rules assume a set of columns to be unique, and
they are not defined as not null and unique, your database schema is not
mapping your business rules. You are not only missing a candidate key
usable by Slony here, you are missing a database side data consistency
check! Whenever you find a table without a primary key, first ask if
this really represents the business process requirements. The only thing
that tends to escape that question is usually some audit, logging or
history table, where adding a bigserial is no big deal anyway.
Jan
--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
|
|
|
|
|