|
Home > Archive > PostgreSQL Performance > March 2006 > update == delete + insert?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
update == delete + insert?
|
|
| Craig A. James 2006-03-21, 3:34 am |
| I've seen it said here several times that "update == delete + insert". On the other hand, I've noticed that "alter table [add|drop] column ..." is remarkably fast, even for very large tables, which leads me to wonder whether each column's contents are in
a file specifically for that column.
My question: Suppose I have a very "wide" set of data, say 100 columns, and one of those columns will be updated often, but the others are fairly static. I have two choices:
Design 1:
create table a (
id integer,
frequently_updated integer);
create table b(
id integer,
infrequently_updated
_1 integer,
infrequently_updated
_2 integer,
infrequently_updated
_3 integer,
... etc.
infrequently_updated
_99 integer);
Design 2:
create table c(
id integer,
frequently_updated integer,
infrequently_updated
_1 integer,
infrequently_updated
_2 integer,
infrequently_updated
_3 integer,
... etc.
infrequently_updated
_99 integer);
If "update == delete + insert" is strictly true, then "Design 2" would be poor since 99 columns would be moved around with each update. But if columns are actually stored in separate files, the Designs 1 and 2 would be essentially equivalent when it come
s to vacuuming.
Thanks,
Craig
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
| |
| Jim Buttafuoco 2006-03-21, 3:34 am |
| go with design 1, update does = delete + insert.
---------- Original Message -----------
From: "Craig A. James" <cjames@modgraph-usa.com>
To: pgsql- performance@postgres
ql.org
Sent: Mon, 20 Mar 2006 14:49:43 -0800
Subject: [PERFORM] update == delete + insert?
> I've seen it said here several times that "update == delete + insert". On the other hand, I've noticed that
> "alter table [add|drop] column ..." is remarkably fast, even for very large tables, which leads me to wonder
> whether each column's contents are in a file specifically for that column.
>
> My question: Suppose I have a very "wide" set of data, say 100 columns, and one of those columns will be
> updated often, but the others are fairly static. I have two choices:
>
> Design 1:
> create table a (
> id integer,
> frequently_updated integer);
>
> create table b(
> id integer,
> infrequently_updated
_1 integer,
> infrequently_updated
_2 integer,
> infrequently_updated
_3 integer,
> ... etc.
> infrequently_updated
_99 integer);
>
> Design 2:
> create table c(
> id integer,
> frequently_updated integer,
> infrequently_updated
_1 integer,
> infrequently_updated
_2 integer,
> infrequently_updated
_3 integer,
> ... etc.
> infrequently_updated
_99 integer);
>
> If "update == delete + insert" is strictly true, then "Design 2" would be poor since 99 columns would be moved
> around with each update. But if columns are actually stored in separate files, the Designs 1 and 2 would be
> essentially equivalent when it comes to vacuuming.
>
> Thanks,
> Craig
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
------- End of Original Message -------
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
| |
| Tom Lane 2006-03-21, 3:34 am |
| "Craig A. James" <cjames@modgraph-usa.com> writes:
> I've seen it said here several times that "update == delete + insert". On the other hand, I've noticed that "alter table [add|drop] column ..." is remarkably fast, even for very large tables, which leads me to wonder whether each column's contents are
in a file specifically for that column.
No. The reason "drop column" is fast is that we make no attempt to
remove the data from existing rows; we only mark the column's entry in
the system catalogs as deleted. "add column" is only fast if you are
adding a column with no default (a/k/a default NULL). In that case
likewise we don't have to modify existing rows; the desired behavior
falls out from the fact that the tuple access routines return NULL if
asked to fetch a column beyond those existing in a particular tuple.
You can read about the storage layout in
http://developer.postgresql.org/doc...es/storage.html
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
| |
| Jaime Casanova 2006-03-21, 3:34 am |
| On 3/20/06, Craig A. James <cjames@modgraph-usa.com> wrote:
> I've seen it said here several times that "update == delete + insert". On the other hand, I've noticed that "alter table [add|drop] column ..." is remarkably fast, even for very large tables, which leads me to wonder whether each column's contents are
in a file specifically for that column.
>
> My question: Suppose I have a very "wide" set of data, say 100 columns, and one of those columns will be updated often, but the others are fairly static. I have two choices:
>
> Design 1:
> create table a (
> id integer,
> frequently_updated integer);
>
> create table b(
> id integer,
> infrequently_updated
_1 integer,
> infrequently_updated
_2 integer,
> infrequently_updated
_3 integer,
> ... etc.
> infrequently_updated
_99 integer);
>
> Design 2:
> create table c(
> id integer,
> frequently_updated integer,
> infrequently_updated
_1 integer,
> infrequently_updated
_2 integer,
> infrequently_updated
_3 integer,
> ... etc.
> infrequently_updated
_99 integer);
>
> If "update == delete + insert" is strictly true, then "Design 2" would be poor since 99 columns would be moved around with each update. But if columns are actually stored in separate files, the Designs 1 and 2 would be essentially equivalent when it co
mes to vacuuming.
>
> Thanks,
> Craig
>
design 1 is normalized and better
design 2 is denormalized and a bad approach no matter the RDBMS
update does delete + insert, and vacuum is the way to recover the space
--
Atentamente,
Jaime Casanova
"What they (MySQL) lose in usability, they gain back in benchmarks, and that's
all that matters: getting the wrong answer really fast."
Randal L. Schwartz
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly
| |
| Jim C. Nasby 2006-03-21, 7:28 am |
| On Mon, Mar 20, 2006 at 08:38:15PM -0500, Jaime Casanova wrote:
> On 3/20/06, Craig A. James <cjames@modgraph-usa.com> wrote:
> design 1 is normalized and better
> design 2 is denormalized and a bad approach no matter the RDBMS
How is design 1 denormalized?
> "What they (MySQL) lose in usability, they gain back in benchmarks, and that's
> all that matters: getting the wrong answer really fast."
> Randal L. Schwartz
Where's that quote from?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
| |
| Merlin Moncure 2006-03-21, 11:33 am |
| > > design 1 is normalized and better
>
> How is design 1 denormalized?
It isn't :)...he said it is normalized. Design 2 may or may not be
de-normalized (IMO there is not enough information to make that
determination) but as stated it's a good idea to split the table on
practical grounds.
merlin
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
| |
| Jim C. Nasby 2006-03-21, 1:32 pm |
| On Tue, Mar 21, 2006 at 09:12:08AM -0500, Merlin Moncure wrote:
>
> It isn't :)...he said it is normalized. Design 2 may or may not be
> de-normalized (IMO there is not enough information to make that
> determination) but as stated it's a good idea to split the table on
> practical grounds.
Err, sorry, got the number backwards. My point is that 2 isn't
denormalized afaik, at least not based just on the example. But yes, in
a case like this, vertical partitioning can make a lot of sense.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly
|
|
|
|
|