|
Home > Archive > PostgreSQL Discussion > September 2005 > Finding (and deleting) dupes in relation table
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Finding (and deleting) dupes in relation table
|
|
|
| I have a table that relates id's of two other tables:
table1id, table2id
Dupes have found their way into it (create unique
index across both fields fails). Is there a quick and
easy way to find and delete the dupes (there are tens
of thousands of records)?
Thanks,
CSN
____________________
______________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
| |
|
|
Nevermind, figured it out:
select distinct on (table1id, table2id) * into temp
from table3;
delete from table3;
insert into table3 select * from temp;
--- CSN < cool_screen_name9000
1@yahoo.com> wrote:
> I have a table that relates id's of two other
> tables:
>
> table1id, table2id
>
> Dupes have found their way into it (create unique
> index across both fields fails). Is there a quick
> and
> easy way to find and delete the dupes (there are
> tens
> of thousands of records)?
>
> Thanks,
> CSN
>
>
>
> ____________________
______________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>
____________________
______________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
| |
| Markus Wollny 2005-09-23, 7:23 am |
| Hello!
CSN & #91;cool_screen_name
90001@yahoo.com] wrote:
> I have a table that relates id's of two other tables:
>
> table1id, table2id
>
> Dupes have found their way into it (create unique index
> across both fields fails). Is there a quick and easy way to
> find and delete the dupes (there are tens of thousands of records)?
>
> Thanks,
> CSN
If your table was created WITH OIDS you could identify the duplicates
thus:
select a.table1id
, a.table12d
, max(a.oid) as maxoid
, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id;
If you wish to delete surplus rows, you might do the following:
delete from schema.mytable where oid in (
select maxoid from (
select a.table1id, a.table12d, max(a.oid) as
maxoid, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id ) as foo
where coid >1 );
This will delete the oldest tuple of a duplicate set of rows; if there
are more than two tuples in a set, you'll want to execute this a couple
of times until there's no duplicate left, as the delete will only reduce
a set by one tuple at a time. I'd also recommend to apply a PRIMARY KEY
constraint afterwards instead of just a unique index - this will prevent
NULL-entries as well as creating the desired unique index - and I think
it's good practice to have a primary key on about every table there is,
except when it's just a junk data table like a logging table where
content is regularly evaluated and discarded.
Kind regards
Markus
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
|
|
|
|
|