| Len Walter 2005-07-26, 3:24 am |
| G'day,
I've submitted a bug at
http://gborg.postgresql.org/project...pdate.php?1383.
The environment is RHEL4ES, Postgres 7.4.6 from distro (rpm name
postgresql-7.4.6-1.RHEL4.1), installed on one master, four slaves,
slony1.1.0 compiled from src->rpm.
Our four slave slony daemons aren't releasing the db connections to
the master, and keep opening new ones, rapidly leading to "too many
clients" errors on the master, which makes replication and all other
db access fail.
It seems to run fine for about twelve hours, then connections start
building up. After about ten hours, we end up with many connections
all in "async_notify waiting" (from ps -efa on the master), holding
locks as follows:
freeside=3D# select mode,pid,relname,use
name,current_query,q
uery_start
from pg_locks, pg_class,pg_stat_act
ivity where pg_class.oid =3D relation
and pid=3Dprocpid and mode like 'Exclusive%' order by query_start;
mode | pid | relname | usename | current_query | query_start
---------------+-------+-------------+----------+---------------------+----=
---------------------------
ExclusiveLock | 23639 | pg_listener | postgres | <IDLE> | 2005-07-26
01:49:10.760922+10
ExclusiveLock | 3997 | pg_listener | postgres | <IDLE> | 2005-07-26
01:53:09.634105+10
ExclusiveLock | 23978 | pg_listener | postgres | <IDLE> | 2005-07-26
01:54:10.626429+10
ExclusiveLock | 22851 | pg_listener | postgres | <IDLE> | 2005-07-26
01:57:41.232085+10
ExclusiveLock | 24546 | pg_listener | postgres | <IDLE> | 2005-07-26
01:58:33.698555+10
ExclusiveLock | 3961 | pg_listener | postgres | <IDLE> | 2005-07-26
01:58:48.898225+10
ExclusiveLock | 24463 | pg_listener | postgres | <IDLE> | 2005-07-26
01:59:11.731+10
ExclusiveLock | 24574 | pg_listener | postgres | <IDLE> | 2005-07-26
01:59:40.865348+10
ExclusiveLock | 24632 | pg_listener | postgres | <IDLE> | 2005-07-26
02:03:11.090778+10
ExclusiveLock | 24651 | pg_listener | postgres | <IDLE> | 2005-07-26
02:04:06.607669+10
ExclusiveLock | 24670 | pg_listener | postgres | <IDLE> | 2005-07-26
02:04:45.562575+10
ExclusiveLock | 24680 | pg_listener | postgres | <IDLE> | 2005-07-26
02:05:17.364743+10
....
ExclusiveLock | 4044 | pg_listener | postgres | commit transaction; |
2005-07-26 10:37:47.625651+10
(93 rows)
I have DEBUG2 logging from all slony daemons. Here are some of the
connections made by one of the slaves - nb there are 24 conns since
starting and 24 conns * 4 slaves =3D~ 100 connections, the postgres
master's limit.
2005-07-25 11:01:50 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
2005-07-25 19:50:42 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
2005-07-26 00:26:23 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
2005-07-26 01:06:44 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
2005-07-26 01:52:05 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
2005-07-26 02:02:15 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
2005-07-26 02:08:00 EST DEBUG1 remoteListenThread_1
: connected to
'host=3Dvispcentral0
1-syd dbname=3Dfreeside user=3Dpostgres port=3D8477
password=3Dpassword'
...until it reaches 24 conns and starts failing because, with the other
slaves doing the same, the master server is full.
The master postgres server log, starting at about the same time as the
conns started building up. The conns below are from slave dbs.
2005-07-25 19:27:42 LOG: 00000: connection received:
host=3Dvispaaa02-syd port=3D52005
LOCATION: BackendFork, postmaster.c:2395
2005-07-25 19:27:42 LOG: 00000: connection authorized: user=3Dpostgres
database=3Dfreeside
LOCATION: BackendFork, postmaster.c:2457
2005-07-25 19:42:28 LOG: 00000: recycled transaction log file "000000030000=
008D"
LOCATION: MoveOfflineLogs, xlog.c:1656
2005-07-25 19:52:55 LOG: 08006: could not send data to client:
Connection reset by peer
LOCATION: internal_flush, pqcomm.c:1000
2005-07-25 19:52:58 LOG: 08006: could not send data to client: Broken pipe
LOCATION: internal_flush, pqcomm.c:1000
2005-07-25 19:53:01 LOG: 00000: connection received:
host=3Dvispaaa01-syd port=3D47697
It looks like the slave daemons should be releasing the connections to
the master, but aren't. Is that right?
thanks in advance,
Len
--=20
Len Walter len.walter- Re5JQEeQqe8AvxtiuMwx
3w@public.gmane.org http://crookedtimbre.net
|