Home > Archive > MySQL ODBC Connector > December 2005 > Are primary keys essential?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Are primary keys essential?
James Harvard

2005-12-21, 8:24 pm

The PK thread has reminded me of a question I had but never resolved when designing the table structure of the big data warehouse app I was droning on about just now in the aforementioned thread. As need to import some hundreds of millions of rows in the
next week, I think now would be a good idea to get a definite answer!

The core of the app is a mass of data, broken into many tables that I normally only need to query individually. Because I felt uneasy not including a primary key and need to get a proof-of-concept db running I ended up putting an auto_increment int column
in the data tables. (Yes, I know, an extra 4 bytes per row when I was talking about saving every byte possible in my last post. <blush> ) But the PK column is never used either as a foreign key or in app code for the table itself. But I couldn't put a PK
on a combination of other columns, because I don't think I can be sure of uniqueness. Can I just drop the PK column?

BTW I'm sure this is addressed in all those good books on database design and theory I should have, but never have, read. But I'm a bit short of time, and it's quicker just to pick the brains of you folks! Quicker for me, that is - sorry!

TIA,
James Harvard

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Kenneth Wagner

2005-12-21, 8:24 pm

Hi James,

An internal, unique, auto-increment PK is a good idea on dynamic, large
tables. Smaller, mostly static, tables can often do without a PK. In a
backend DB it's a godsend. What would happen if I depended on the
programming in the client-side code to handle the PK?? It would be
impossible.

Example: Customers need a PK. An internal, auto-increment, unique, integer
works best on DBs. They are optimized for it, especially SQL tables.

Example: Counties. These names hardly ever change and are rather small. So,
you could do without the internal, integer PK. BUT-- suppose every invoice
must show the county? Then I would need a county (e.g., varchar 25) field
for the county name of Yoknapatawphahootche
e. If the customer has thousands
of invoices with us then county starts to take up space. And a fair amount
of it. Millions of invoices would take up 21 millions of bytes of storage.
Just multiply number of customers by numbers of invoices. If the invoice
uses a 1 byte field for the county with an FK in the county file we
automatically save 40 millions of bytes of storage. Since no state has more
than 255 counties, a tiny int will work.

Example: States. Hmmm. Let's see, abbreviation for Minnesota is MN. Only two
bytes needed there. OK, That will work for a PK. We'll do without the
integer, auto-increment, key here. And we will violate consistency in the
DB. No big deal in this case. I'll go either way here.

Another good reason is YOU control the PK. Social Security numbers and
ZipCodes are already being prepared for changes. Why go bonkers 7 years from
now? And who wants to back-code and back-fill all that code & data?

Truncated, damaged file? What invoice does the row point to? What is the
date? If it's the last row in the PK then that's where the fixing begins by
date and by PK#.

Since a PK guarantees uniqueness, I can breathe easily if the file gets
clobbered. Example: Someone with BIG access rights accidentally deletes
invoices from 1996. But they should have done it for 1995. (We keep 10 years
history on hand in an archive file.) Easy to fix. What's the last# in 1995?
The first # in 1997? Restore only those rows with those numbers, inclusive.
Suppose now the Customer file is by name, ZIP and first 4 digits in the
address? (Or something like that. Many of my junk mail addresses show
"WAGNEKO64054A". If another Wagner, Kenneth O. arrives in zip code 64054,
presumably, he becomes "WAGNEKO64054B." No middle initial? The it's
"WAGNEK_64054.") How could I know which customers to restore without going
thru a lot of queries?

What if I have to break up a table because it's getting too large? Has too
many columns or rows added to it in the last year? The integer PK works
really well. And it is small and simple to eyeball. And I can still have any
other unique index I want on columns in the new related 2nd table.

Mostly, I like the int, auto-increment, unique, unsigned PK because it's
used everywhere, always looks the same and is very, very fast because the
indexes are small, optimized for SQL usage and very parsimonious about RAM.
Intuitively, the numbers also tell me about size and activity levels.

Best of all, they are inviolate. I can trust them. Well, OK, maybe once a
decade a gamma ray hits the oxide layer on the disk drive and changes
3,212,434,334 to 3,712,434,334. Still easy to fix. Drop the key and
re-create it. It will automatically show up. Either as a gap or a duplicate.
But it WILL show up as something.

Bottom line, uniqueness, stability and order are the "sine qua non"* of good
data organization. (*Means without which nothing.)

HTH,

Ken






----- Original Message -----
From: "James Harvard" <james.lists.tech@harvard-digital.co.uk>
To: <mysql@lists.mysql.com>
Sent: Wednesday, December 21, 2005 6:01 PM
Subject: Are primary keys essential?


> The PK thread has reminded me of a question I had but never resolved when
> designing the table structure of the big data warehouse app I was droning
> on about just now in the aforementioned thread. As need to import some
> hundreds of millions of rows in the next week, I think now would be a good
> idea to get a definite answer!
>
> The core of the app is a mass of data, broken into many tables that I
> normally only need to query individually. Because I felt uneasy not
> including a primary key and need to get a proof-of-concept db running I
> ended up putting an auto_increment int column in the data tables. (Yes, I
> know, an extra 4 bytes per row when I was talking about saving every byte
> possible in my last post. <blush> ) But the PK column is never used either
> as a foreign key or in app code for the table itself. But I couldn't put a
> PK on a combination of other columns, because I don't think I can be sure
> of uniqueness. Can I just drop the PK column?
>
> BTW I'm sure this is addressed in all those good books on database design
> and theory I should have, but never have, read. But I'm a bit short of
> time, and it's quicker just to pick the brains of you folks! Quicker for
> me, that is - sorry!
>
> TIA,
> James Harvard
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/mysql? unsub...> @peoplepc.com
>
>




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

James Harvard

2005-12-21, 8:24 pm

Thanks the on & off-list replies, but I obviously didn't explain my situation very well!

My app is essentially creating summary reports from large amounts of data. It is _not_ doing the actual data warehousing. It's international trade data.

The data tables contain foreign keys for stuff like destination country, trade commodity category etc., but they are _not_ themselves referenced by any other table. Therefore I have not yet found, nor do I envisage finding, any use for an arbitrary auto_i
ncrement primary key. So why would/might I need a PK at all?

TIA,
James Harvard

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

James Harvard

2005-12-21, 8:24 pm

In hindsight my thread title was misleading - sorry. Should have been "are primary keys _always_ essential?".
JH

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Logan, David

2005-12-21, 8:24 pm

Hi James,

Not AFAIK, one can create tables without specifying a PK and there is no
objection. Data is stored quite happily and you should be able to use
your FK's to access other data. It must be voluntary because the ALTER
TABLE DML statement has

DROP PRIMARY KEY drops the primary index. Note: In older versions of
MySQL, if no primary index existed, then DROP PRIMARY KEY would drop the
first UNIQUE index in the table. This is not the case in MySQL 5.0,
where trying to use DROP PRIMARY KEY on a table with no primary key will
give rise to an error.=20

Regards

David Logan=20
Database Administrator=20
HP Managed Services=20
148 Frome Street,=20
Adelaide 5000=20
Australia=20

+61 8 8408 4273 - Work=20
+61 417 268 665 - Mobile=20
+61 8 8408 4259 - Fax=20


-----Original Message-----
From: James Harvard [mailto:james.lists.tech@harvard-digital.co.uk]=20
Sent: Thursday, 22 December 2005 12:45 PM
To: mysql@lists.mysql.com
Subject: Re: Are primary keys essential?

In hindsight my thread title was misleading - sorry. Should have been
"are primary keys _always_ essential?".
JH

--=20
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: =
http://lists.mysql.com/mysql?unsub=3Ddavid.logan@hp.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Kenneth Wagner

2005-12-22, 3:23 am

Hi James,

If the tables you use have primary keys that _YOU_ don't need or use then,
for you, they are not essential.

I do have a question, though. You are working with SQL tables, aren't you?
And from wherever they came from or exist, I presume from your reply that
you don't need the PKs.

The only help I can envisage for you is if you want to look at the tables in
PK order for some reason.

Does your app create tables from the "large amouns of data?" If you report
from the tables your app creates then you have no need at all for the PKs.
Depends how you want the report data to appear.

On the other hand, the "large amounts of data" db may need or want those
PKs.

Hope this makes sense,

Ken


----- Original Message -----
From: "James Harvard" <james.lists.tech@harvard-digital.co.uk>
To: <mysql@lists.mysql.com>
Sent: Wednesday, December 21, 2005 8:08 PM
Subject: Re: Are primary keys essential?


> Thanks the on & off-list replies, but I obviously didn't explain my
> situation very well!
>
> My app is essentially creating summary reports from large amounts of data.
> It is _not_ doing the actual data warehousing. It's international trade
> data.
>
> The data tables contain foreign keys for stuff like destination country,
> trade commodity category etc., but they are _not_ themselves referenced by
> any other table. Therefore I have not yet found, nor do I envisage
> finding, any use for an arbitrary auto_increment primary key. So why
> would/might I need a PK at all?
>
> TIA,
> James Harvard
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/mysql? unsub...> @peoplepc.com
>
>




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Rhino

2005-12-22, 3:23 am


----- Original Message -----
From: "James Harvard" <james.lists.tech@harvard-digital.co.uk>
To: <mysql@lists.mysql.com>
Sent: Wednesday, December 21, 2005 9:08 PM
Subject: Re: Are primary keys essential?


> Thanks the on & off-list replies, but I obviously didn't explain my
> situation very well!
>
> My app is essentially creating summary reports from large amounts of data.
> It is _not_ doing the actual data warehousing. It's international trade
> data.
>
> The data tables contain foreign keys for stuff like destination country,
> trade commodity category etc., but they are _not_ themselves referenced by
> any other table. Therefore I have not yet found, nor do I envisage
> finding, any use for an arbitrary auto_increment primary key. So why
> would/might I need a PK at all?


Well, if this is _just_ summary data, I suppose it's okay to get along
without PKs. But I've never had much to do with the theory and practice of
data warehouses beyond a few very basic concepts so I wouldn't take that to
the bank....

Rhino





--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 20/12/2005


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com