|
Home > Archive > Slony1 PostgreSQL Replication > January 2006 > Slony and Network Delays
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Slony and Network Delays
|
|
| Aaron Randall 2006-01-05, 3:24 am |
| Hi,
I was wondering whether anybody had any information they could give me
about how Slony is affected by network delays.
I am implementing Slony in two sites, which are in two separate cities.
I have used Slony in a test environment, which works fine, but the two
servers are side by side, so this gives no indication of affects by
network delay.
For example, does Slony use two-phase commit/have a set Master buffer
size/a recommended minimum delay time for Slony to work successfully?
Many thanks,
Aaron
| |
| Rod Taylor 2006-01-05, 9:24 am |
| On Thu, 2006-01-05 at 08:25 +0000, Aaron Randall wrote:
> Hi,
>
> I was wondering whether anybody had any information they could give me
> about how Slony is affected by network delays.
Badly and network outages don't help anything. I suggest you use the log
based method for inter-site replication instead of allowing direct
connections. Slony's standard mode was not designed to work around
network limitations.
--
| |
| Andrew Sullivan 2006-01-05, 9:24 am |
| On Thu, Jan 05, 2006 at 08:36:40AM -0500, Rod Taylor wrote:
> On Thu, 2006-01-05 at 08:25 +0000, Aaron Randall wrote:
>
> Badly and network outages don't help anything. I suggest you use the log
> based method for inter-site replication instead of allowing direct
> connections. Slony's standard mode was not designed to work around
> network limitations.
Well, yes and no. If you have a reasonably reliable WAN, it's
_certainly_ designed for that sort of environment. If you have an
unstable network that comes and goes with the weather, what Rod says
is true. But Afilias uses Slony to replicate things from (for
example) Missouri to Toronto all the time, without any adverse
effects.
A
--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin
| |
| David Boreham 2006-01-05, 9:24 am |
| Rod Taylor wrote:
>On Thu, 2006-01-05 at 08:25 +0000, Aaron Randall wrote:
>
>
>
>Badly and network outages don't help anything. I suggest you use the log
>based method for inter-site replication instead of allowing direct
>connections. Slony's standard mode was not designed to work around
>network limitations.
>
>
I'm wondering if the OP was asking about network latency ? (hence 'delay').
This can be a problem for replication mechanisms if the propagation of
update
records between nodes is not pipelined. In that case there will be a
stall with
1 x RTT duration between each update. With a typical WAN latency of 50ms or
so this will limit replication performance to only 10 updates per second.
I have no idea if slony suffers from this problem, but seeing the post
made me
think of this issue, which I have seen in other replication products.
Network partition, or transient outages would present problems for
replication too, but I'm not sure if the OP was asking about that set of
problems.
| |
| Aaron Randall 2006-01-05, 9:24 am |
| Rod Taylor wrote:
>On Thu, 2006-01-05 at 08:25 +0000, Aaron Randall wrote:
>
>
>
>Badly and network outages don't help anything. I suggest you use the log
>based method for inter-site replication instead of allowing direct
>connections. Slony's standard mode was not designed to work around
>network limitations.
>
>
>
I googled it, but couldn't find anything, could you explain more about
what "log based method" please?
Thank you :)
--
Aaron Randall
VisionOSS Ltd.
Providing excellence in OSS architecture and solutions consultancy
Email: aaron.randall- LJ+G6BnwIY2akBO8gow8
eQ@public.gmane.org
Mobile: +44 (0)7742 432165
| |
| Rod Taylor 2006-01-05, 9:24 am |
| > >Badly and network outages don't help anything. I suggest you use the log
> I googled it, but couldn't find anything, could you explain more about
> what "log based method" please?
Not really. I've never used it.
Check the 1.1 documentation for Log Shipping.
--
| |
| Andrew Sullivan 2006-01-05, 9:24 am |
| On Thu, Jan 05, 2006 at 07:30:40AM -0700, David Boreham wrote:
> This can be a problem for replication mechanisms if the propagation of
> update
> records between nodes is not pipelined. In that case there will be a
> stall with
> 1 x RTT duration between each update. With a typical WAN latency of 50ms or
> so this will limit replication performance to only 10 updates per second.
> I have no idea if slony suffers from this problem, but seeing the post
> made me
> think of this issue, which I have seen in other replication products.
That's not a problem. Slony is _asyc_ replication precisely because
we needed wide-area replication as one of the things it could do.
The trade-off is that the replicas are not always perfect copies of
the origin (that is, there is some lag, and the lag is unpredictable
within some upper bound for a given network throughput and workload).
That "not always perfect copies" issue is what makes automatic
failover mostly dangerous.
It seems that what Rod is talking about is either heavy workload or
an unreliable network or both; and there will indeed be a measurable
effect on the origin node if the write sets are very large (or the
network is slow, or both) -- i.e. large enough to cause you to need
more transfer speed than you have. We have experienced nodes being
as much as a day behind without any serious effects, but our write
traffic wasn't that heavy. If the network were very unstable,
though, you'd be in a different situation.
You also need to have pretty siginificant security around all this.
Remember that Slony _requires_ superuser acces on all active nodes in
a cluster. So you'd better be using a VPN of some sort.
A
A
--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
When my information changes, I alter my conclusions. What do you do sir?
--attr. John Maynard Keynes
| |
| Christopher Browne 2006-01-05, 11:24 am |
| Aaron Randall wrote:
>For example, does Slony use two-phase commit/have a set Master buffer
>size/a recommended minimum delay time for Slony to work successfully?
>
>
There is no two phase commit involved...
Slony-I is an *asynchronous* replication system, meaning that changes
are recorded at the origin, and then applied to the subscribers some
time later. "Some time later" can be fairly arbitrarily later.
We're running Slony-I across some across-the-continent links, which
doesn't require any kind of special Slony-I configuration. Slony-I
doesn't use any special network protocol; it merely connects as Yet
Another PostgreSQL client via libpq and the concommittant network
protocol used by psql and other "ordinary" PostgreSQL clients.
High bandwidth across that distance is pretty expensive, so there is
likely to be the compromise that you've got limited bandwidth. That
will make it take a long time to get a new subscriber set up across the
WAN link. But once it is subscribed, we commonly see that the "very
distant" node is only a second or two behind.
There is an indirect notion of "buffer size" in terms of the "sync
grouping" (the -g parameter to slon); in version 1.2, there will be a
buffer size used to manage MEMORY consumption. But that's all more
about memory management than network management.
| |
| Rod Taylor 2006-01-05, 11:24 am |
| On Thu, 2006-01-05 at 08:49 -0500, Andrew Sullivan wrote:
> On Thu, Jan 05, 2006 at 08:36:40AM -0500, Rod Taylor wrote:
>
> Well, yes and no. If you have a reasonably reliable WAN, it's
> _certainly_ designed for that sort of environment. If you have an
> unstable network that comes and goes with the weather, what Rod says
> is true. But Afilias uses Slony to replicate things from (for
> example) Missouri to Toronto all the time, without any adverse
> effects.
Call me a pessimist. When I pushed Slony in directions slightly off
"normal" I regularly ended up in troubled.
--
|
|
|
|
|