Home > Archive > PostgreSQL Bugs > November 2005 > strange disappearence of postgres file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author strange disappearence of postgres file
Harald Armin Massa

2005-11-21, 9:24 am

I ran into a very strange disappearance of a postgresql data file.

The environment:

Windows XP professional

select version();
PostgreSQL 8.0.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2(mingw-special)

All datafiles are within c:\ghum\data

that part of the harddrive is only accessable to user account postgres (=
the one running the service) and Administrator.

the logentries:

postgresql-2005-11-21_080758.log
2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected
status 128
2005-11-21 08:08:05 LOG: aborting startup due to startup process failure
2005-11-21 08:08:06 LOG: logger shutting down

<file://///Svs00004/K__Anw_Allg/massa/haase/pglog/postgresql-2005-11-21_084022.log>
postgresql-2005-11-21_084022.log
2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20
16:46:29 Westeuropäische Normalzeit
2005-11-21 08:40:24 LOG: checkpoint record is at 4/2C880BC0
2005-11-21 08:40:24 LOG: redo record is at 4/2C880BC0; undo record is at
0/0; shutdown TRUE
2005-11-21 08:40:24 LOG: next transaction ID: 421189; next OID: 381050
2005-11-21 08:40:24 LOG: database system is ready
2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:45:52 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:45:59 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:46:09 ERROR: could not open relation 1663/17253/43471: No such
file or directory
2005-11-21 09:46:30 ERROR: could not open relation 1663/17253/43471: No such
file or directory

.... and on and on and on...

My checking showed that... within directory data/base/17253 there is indeed
no file named 43471; but ones named 43470,43472,43473 .....

Additional Information: that is a laptop-computer, and it is quite possible
that it was restarted between 8:08 and 8:40 this morning.

But ... I see no line mentioning a "killing of a file". Can I do something
more to find out WHY this file disappeared? (and esp. make sure that no more
files disappear :( )

Harald


--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607

Qingqing Zhou

2005-11-22, 8:24 pm


"Harald Armin Massa" < haraldarminmassa@gma
il.com> wrote

>
> 2005-11-21 09:45:50 ERROR: could not open relation 1663/17253/43471: No
> such
> file or directory
>
> My checking showed that... within directory data/base/17253 there is
> indeed
> no file named 43471; but ones named 43470,43472,43473 .....
>


This may be not a direct step to solve the problem, but let's get some
feelings first:

Is 43471 a temp table? To find out this: "select relname, nspname from
pg_class, pg_namespace where pg_namespace.oid=pg_class.relnamespace and
relfilenode = 43471;"

Regards,
Qingqing


Harald Armin Massa

2005-11-23, 7:24 am

Quingqing,

no, it is definitely NOT a temp table. I know which table it is, because:

that "could not open relation ..." error message was exactly provokable with
"select * from repofeld", which is one of my tables in that database.

(And that was the way it was passed on to me: my application put that error
on screen)


Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607

Qingqing Zhou

2005-11-23, 7:24 am


On Wed, 23 Nov 2005, Harald Armin Massa wrote:
>
> no, it is definitely NOT a temp table. I know which table it is,
> because: that "could not open relation ..." error message was exactly
> provokable with "select * from repofeld", which is one of my tables in
> that database.
>


What is your file system, NTFS or FAT32? Is that table newly created?

So there is a valid record in pg_class but the representing data file is
lost ... a possible theory of what's happened could be:

1) create the table;
2) a checkpoint happens;
3) lost power;
[ restar the machine and database ]
4) file system recovery - unable to recovery your data file;
5) database recovery - don't play WAL and recreate your data file because
of the checkpoint;

Is that possible?

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly

Harald Armin Massa

2005-11-23, 7:24 am

Hello QuingQing,

>
> What is your file system, NTFS or FAT32? Is that table newly created?



File System is NTFS. That table was created with database installation,
which was short after after release of PostgreSQL 8.0 - so that database was
in use for > 9 months. That table is a central table of the application, and
is used very often ... the application was running for the whole 9 months :)

So there is a valid record in pg_class but the representing data file is
> lost ... a possible theory of what's happened could be:
>
> 1) create the table;
> 2) a checkpoint happens;
> 3) lost power;
> [ restar the machine and database ]
> 4) file system recovery - unable to recovery your data file;
> 5) database recovery - don't play WAL and recreate your data file because
> of the checkpoint;



Is that possible?
>

Rather not ... that table was there from the beginning and for some months.
(it gets created with initial database install)

The only strange thing is that short 3 lines log before the big error. As to
power loss: that is a laptop; so "loosing power" would need to take out the
accumulator. Which is quite possible, but rather unlikely given the
technical level of the user.

postgresql-2005-11-21_080758.log
2005-11-21 08:08:05 LOG: startup process (PID 1220) exited with unexpected
status 128
2005-11-21 08:08:05 LOG: aborting startup due to startup process failure
2005-11-21 08:08:06 LOG: logger shutting down

postgresql-2005-11-21_084022.log
2005-11-21 08:40:24 LOG: database system was shut down at 2005-11-20
16:46:29 Westeuropäische Normalzeit

--> can that exiting process with 128 be a sign of "system gets killed while
postgres is starting up"? (and taking a file with it????)

As much as I learned, PostgreSQL only APPENDS to the data files; and only a
vacuum full can truncate them ... and on that machine there is no vacuum
full happening.

It still keeps being a mystery.

Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607

Alvaro Herrera

2005-11-23, 7:24 am

Harald Armin Massa wrote:
> Hello QuingQing,
>
>
> File System is NTFS. That table was created with database installation,
> which was short after after release of PostgreSQL 8.0 - so that database was
> in use for > 9 months. That table is a central table of the application, and
> is used very often ... the application was running for the whole 9 months :)


How many database-wide vacuums did you run during these 9 months? I'm
smelling transaction Id wraparound in pg_class or some other system
catalog.

This has been known to happen. Please see the archives.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Harald Armin Massa

2005-11-23, 7:24 am

Alvaro,

How many database-wide vacuums did you run during these 9 months? I'm
> smelling transaction Id wraparound in pg_class or some other system
> catalog.
>
> This has been known to happen. Please see the archives.
>


"database wide" vacuums have been very very seldom; I fear the only one was
at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with
autovacuum enabled ... so in the long run THAT possible reason should
vanish.

And I understand correctly that your suspicion is: file gets killed because
of transaction id wraparound in system table?

I will scan teh archives (as soon as postgresql.org provices access to them
again :)

Harald

--
GHUM Harald Massa
persuasion python postgresql
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607

Alvaro Herrera

2005-11-23, 8:24 pm

Harald Armin Massa wrote:
> Alvaro,
>
>
> "database wide" vacuums have been very very seldom; I fear the only one was
> at initial db bulk load. at the moment I am on a way to upgrade to 8.1, with
> autovacuum enabled ... so in the long run THAT possible reason should
> vanish.
>
> And I understand correctly that your suspicion is: file gets killed because
> of transaction id wraparound in system table?


No, the file doesn't get killed, but an old tuple on the pg_class system
catalog magically appears because it was deleted and marked with a very
old Xmax, that after the wraparound is seen as future; and thus the
tuple that was deleted is suddenly visible. For this to happen you need
to have failed to vacuum pg_class in a very long time.

There may be other explanations, but we'd need more data, like how much
tuples for that table you got on pg_class and what do they look like.
Try something like

set enable_indexscan to off;
select xmin, xmax, relname from pg_class where relname = 'the table';

You can search the archives using http://www.pgsql.ru

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com