Home > Archive > Slony1 PostgreSQL Replication > December 2005 > Pending transaction check for copy_set









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Pending transaction check for copy_set
Casey Duncan

2005-12-01, 11:24 am

I've run up against the fairly well documented constraint in the
copy_set() function in remote_worker.c. The constraint delays copy_set
if there is a pending transaction that was opened prior to the one that
installed the slony triggers.

This entry in the faq describes the issue:
http://developer.postgresql.org/~wi...nguide-1.1.rc1/
faq.html#AEN2675

My question has to do with the last paragraph of that entry:

"By the way, if there is more than one database on the PostgreSQL
cluster, and activity is taking place on the OTHER database, that will
lead to there being "transactions earlier than XID whatever" being
found to be still in progress. The fact that it's a separate database
on the cluster is irrelevant; Slony-I will wait until those old
transactions terminate."

(I'm glad that was in there otherwise I'd have torn my hair out by now
btw ;^)

Looking at the code for copy_set(), it makes a query that calls the
getMinXid() stored func to check for pending transactions. getMinXid(),
as I understand it returns the transaction id of the oldest running
transaction in the entire postgres cluster, regardless of what database
it is in.

I wonder why this blanket limitation is necessary. could we not, at
least in theory, limit this check to the subscribing node db (and maybe
the provider node, I'm not sure)? If so, to my naive eyes, it would
seem appropriate to query pg_locks or pg_stat_activity to perform this
check, limiting it to only the databases that were absolutely necessary
to check. Is this possible or is there some limitation or edge case
that makes this unsafe? Will long-running read transactions also block
copy_set?

I'd like to develop a patch that loosens this restriction if possible.
I'm finding it quite troublesome especially on my dev and qa systems.

-Casey
Jan Wieck

2005-12-01, 8:25 pm

On 11/30/2005 7:04 PM, Casey Duncan wrote:

> I've run up against the fairly well documented constraint in the
> copy_set() function in remote_worker.c. The constraint delays copy_set
> if there is a pending transaction that was opened prior to the one that
> installed the slony triggers.
>
> This entry in the faq describes the issue:
> http://developer.postgresql.org/~wi...nguide-1.1.rc1/
> faq.html#AEN2675
>
> My question has to do with the last paragraph of that entry:
>
> "By the way, if there is more than one database on the PostgreSQL
> cluster, and activity is taking place on the OTHER database, that will
> lead to there being "transactions earlier than XID whatever" being
> found to be still in progress. The fact that it's a separate database
> on the cluster is irrelevant; Slony-I will wait until those old
> transactions terminate."
>
> (I'm glad that was in there otherwise I'd have torn my hair out by now
> btw ;^)
>
> Looking at the code for copy_set(), it makes a query that calls the
> getMinXid() stored func to check for pending transactions. getMinXid(),
> as I understand it returns the transaction id of the oldest running
> transaction in the entire postgres cluster, regardless of what database
> it is in.
>
> I wonder why this blanket limitation is necessary. could we not, at
> least in theory, limit this check to the subscribing node db (and maybe
> the provider node, I'm not sure)? If so, to my naive eyes, it would
> seem appropriate to query pg_locks or pg_stat_activity to perform this
> check, limiting it to only the databases that were absolutely necessary
> to check. Is this possible or is there some limitation or edge case
> that makes this unsafe? Will long-running read transactions also block
> copy_set?


This "blanket limitation" is used because this information is readily
available in the transactions snapshot information. To the the fine
grained information you're looking for, one would have to traverse the
PGPROC array that's sitting in shared memory.


>
> I'd like to develop a patch that loosens this restriction if possible.
> I'm finding it quite troublesome especially on my dev and qa systems.


If you have broken clients that leave open transactions around for
infinite times (I don't say you have, but I have seen that before), then
you have much bigger problems to worry about. One of them would be that
vacuum cannot remove rows, which in the Slony case will also cause that
pg_listener and other relations will clog up with dead tuples.


Jan

--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Christopher Browne

2005-12-01, 8:25 pm

Jan Wieck wrote:

>
>
> If you have broken clients that leave open transactions around for
> infinite times (I don't say you have, but I have seen that before),
> then you have much bigger problems to worry about.


That being said, if there is an easy way for us to figure out that
"seemingly offending old transactions" are attached to other databases,
and thus irrelevant, then this cuts down on the "unexpected *it won't
subscribe! $&$!*(!#!!!*" complaints.

> One of them would be that vacuum cannot remove rows, which in the
> Slony case will also cause that pg_listener and other relations will
> clog up with dead tuples.


On that latter note, I have at least one idea vis-a-vis the "clog" in
pg_listener, and possibly two.

Right now, confirmations are also propagated via pg_listener, which
means that for every event, you have (I think) FOUR pg_listener entries
that go thru their life cycle. But really, the confirmation isn't all
that interesting; we don't take action on those; we only take action on
"real events."

My thought is to essentially drop the confirmation events out of what
gets stowed as a NOTIFY in pg_listener.

Acceptance of the confirmation may wait until the next event; I don't
*think* that hurts us.

A while back, you indicated:

Don't forget that WAIT FOR EVENT relies on CONFIRM's to come in. If your
make their delivery random on "something happens anyway", you might end
up waiting forever.

If the slon for the node is alive, it'll be generating at least its own
events once every minute or so, and in a live cluster, *some* "real"
events should be flowing (e.g. - aside from confirmations). If we have
to be dependent on sort of artificially generated events, to this
degree, it seems to me we're borrowing some trouble.

In any case, I did some walkthru of WAIT FOR EVENT, and didn't see
reliance on CONFIRMs *being notifications.* If the event loop in
remote_listen.c performs both the pulling of new events as well as
checking confirmations in its main loop, I *think* we'll be OK. It
should be easy enough to test.

That's "idea #1", which ought to help us to the tune of ~50% of the
pg_listener clog. (The more nodes, the less diversely connected, the
more confirmations there normally are, and so the MORE helpful this
gets.) I can't find the patch that I thought I had for this; it
shouldn't be hard to recreate it, and it should be quite easy to test
it. I set up a cluster, have NO traffic, along with pretty long timeout
parameters on the slons, thereby minimizing confirmations, and then
submit a slonik script that does WAIT FOR EVENT to see if it blocks
forever. I'm optimistic :-)

Idea #2 is then to restructure the material in remote_listen.c so that
if we see, for a given slon+node, that communications are going well, we
turn off the LISTEN (which will eliminate pg_listener tuple generation)
and instead poll... Every listener that you eliminate eliminates a
barrel of dead pg_listener tuples... If each node is working well, then
this would mean that there are NO listeners, therefore NO pg_listener
entries, and therefore NO bloat there at all. That's the tougher change.
Jan Wieck

2005-12-01, 8:25 pm

On 12/1/2005 4:57 PM, Christopher Browne wrote:

> Idea #2 is then to restructure the material in remote_listen.c so that
> if we see, for a given slon+node, that communications are going well, we
> turn off the LISTEN (which will eliminate pg_listener tuple generation)
> and instead poll... Every listener that you eliminate eliminates a
> barrel of dead pg_listener tuples... If each node is working well, then
> this would mean that there are NO listeners, therefore NO pg_listener
> entries, and therefore NO bloat there at all. That's the tougher change.


How could this have escaped me?

You are right, if there are no listeners, then a notify is just a NOP.
So if anything that today relies on notifications just unlistens after
getting one, then polls for let's say 60 seconds, extending the polling
period if something is found ... and only if it didn't find anything for
60 seconds it will LISTEN again and wait.

We then have to significantly increase the timeout interval, in which we
generate SYNC no matter if we detected any change or not. Forcing a SYNC
and thereby a NOTIFY every 60 seconds, followed by 60 seconds of polling
.... we could as well just poll and forget everything else.

But this is indeed a much better solution that what I had in mind so
far. Thank you.


Jan

--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Jan Wieck

2005-12-01, 8:25 pm

On 12/1/2005 5:27 PM, Casey Duncan wrote:

> We do have regular transactions that take several minutes to complete
> (some reading only, others updating) on databases in the same postgres
> instance as the would-be provider node. Restricting this check to the
> provider db would make creating nodes much smoother for us, and I think
> rid slony of a rather unintuitive failure mode.


I hope you do ... pg_dump should be a regular task on your production
systems ;-)

Me thunk a little about the whole issue ... and it appears that I might
have been over conservative at the time. Fact is that adding the table
to the set does already add the slony log trigger. The subscription
process happens later and requires another event created on the master,
the ENABLE_SUBSCRIPTION event, which is a slon response for having added
the new subscriber (meaning among other things that no more tables can
be added to the set and that it will not delete any log data, not
confirmed by the new subscriber). Creating the trigger acquires an
access exclusive lock on the table, so that will already wait if any
concurrent transaction has any kind of lock (even for having selected
from it). A transaction that tries to access the table in any way after
setAddTable() created the trigger will block until the latter commits
and will for sure fire triggers then. That means that after the
setAddTable() is done and committed, there cannot be any other
transaction around that can modify the table but not fire the trigger.

Whow, still with me on this? So ... if I didn't make any mistake in the
above ... the whole thing should be obsolete anyway. Removing that check
isn't something I would lightly do, and it for sure won't happen in any
stable branch.

But I think we should consider commenting out the check in HEAD.


Jan

--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Casey Duncan

2005-12-01, 8:25 pm

On Dec 1, 2005, at 1:20 PM, Jan Wieck wrote:

> On 11/30/2005 7:04 PM, Casey Duncan wrote:
> [..]
>
> This "blanket limitation" is used because this information is readily
> available in the transactions snapshot information. To the the fine
> grained information you're looking for, one would have to traverse the
> PGPROC array that's sitting in shared memory.


I'll have a look at that. If nothing else it gives me an excuse to poke
around more in the internals ;^)

>
> If you have broken clients that leave open transactions around for
> infinite times (I don't say you have, but I have seen that before),
> then you have much bigger problems to worry about. One of them would
> be that vacuum cannot remove rows, which in the Slony case will also
> cause that pg_listener and other relations will clog up with dead
> tuples.


Understood. This did flush out at least one case where starting an
application and leaving it dormant would leave open transactions
indefinitely. Since in production they aren't really ever dormant, I
never noticed this, but it caused slony to complain in our development
environment.

We do have regular transactions that take several minutes to complete
(some reading only, others updating) on databases in the same postgres
instance as the would-be provider node. Restricting this check to the
provider db would make creating nodes much smoother for us, and I think
rid slony of a rather unintuitive failure mode.

-Casey
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com