Home > Archive > Slony1 PostgreSQL Replication > January 2006 > "Blueprints for High Availability"









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author "Blueprints for High Availability"
Brian A. Seklecki

2006-01-19, 8:24 pm

Wiley Press, ISBN 0-471-43026-9, Even Marcus & Hal Stern

Whatever you do, don't read this book when planning your enterprise-class
PostgreSQL cluster using Slony1. The author(s) give a scathing opinion of
network based asynchronous database replication. Especially for redundant
configurations within the same facility. They concede that the method has
some applicable uses (facility to facility replication), but they go so
far as to recommend long distance SAN before software+network.

The entire text has a highly anti-microsoft undercurrent which makes it a
real page-turner, unfortunately, most of the advice regarding HA
application clusters has a commercial-UNIX oriented slant (they all but
endorse VERITAS).

The book only serves to further emphasize that there is no definitive FMS
(Fail over Management Software) solution for Open Source UNIX-like OSs.
No true platform-independent (well, Linux-HA[.org]) project that
integrates with monitoring, databases, web servers, load balancers, RAID
/ SAN controller, etc.

The projects are there (PostgreSQL, Slony, PGPool, Nagios, Net-SNMP,
FreeVRRPd, FreeBSD, GNU/Linux, Linux-HA, etc..), there just no integration
yet.

~BAS

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Bradley Kieser

2006-01-19, 8:24 pm

Bas, I am working on an integration technique that solves these problems
and is already showing tremendous promise. It's in production use in a
very high pressure environment but suffice to say it's an integration of
what's out there (I am 100% Linux based so don't expect anything but Linux).

Once it is completed and I feel that I can present it to a wider
audience such as this one, I will formally announce it and seek a peer
review process.

But let's just say that it is possible, RIGHT NOW, to get full
redundancy and failover that works, it OpenSource and runs on cheap
hardware.

Brian A. Seklecki wrote:
> Wiley Press, ISBN 0-471-43026-9, Even Marcus & Hal Stern
>
> Whatever you do, don't read this book when planning your
> enterprise-class PostgreSQL cluster using Slony1. The author(s) give
> a scathing opinion of network based asynchronous database
> replication. Especially for redundant configurations within the same
> facility. They concede that the method has some applicable uses
> (facility to facility replication), but they go so far as to recommend
> long distance SAN before software+network.
>
> The entire text has a highly anti-microsoft undercurrent which makes
> it a real page-turner, unfortunately, most of the advice regarding HA
> application clusters has a commercial-UNIX oriented slant (they all
> but endorse VERITAS).
>
> The book only serves to further emphasize that there is no definitive
> FMS (Fail over Management Software) solution for Open Source UNIX-like
> OSs. No true platform-independent (well, Linux-HA[.org]) project that
> integrates with monitoring, databases, web servers, load balancers,
> RAID / SAN controller, etc.
>
> The projects are there (PostgreSQL, Slony, PGPool, Nagios, Net-SNMP,
> FreeVRRPd, FreeBSD, GNU/Linux, Linux-HA, etc..), there just no
> integration yet.
>
> ~BAS
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly

Jim C. Nasby

2006-01-20, 1:24 pm

<dons Nomex undies>
Well, I would generally have to agree on not using Slony 1 for HA. I
don't see how it could be considered acceptable to potentially lose
committed transactions when the master fails. Unless maybe my
understanding of Slony is flawed...

On Thu, Jan 19, 2006 at 07:42:47PM -0500, Brian A. Seklecki wrote:
> Wiley Press, ISBN 0-471-43026-9, Even Marcus & Hal Stern
>
> Whatever you do, don't read this book when planning your enterprise-class
> PostgreSQL cluster using Slony1. The author(s) give a scathing opinion of
> network based asynchronous database replication. Especially for redundant
> configurations within the same facility. They concede that the method has
> some applicable uses (facility to facility replication), but they go so
> far as to recommend long distance SAN before software+network.
>
> The entire text has a highly anti-microsoft undercurrent which makes it a
> real page-turner, unfortunately, most of the advice regarding HA
> application clusters has a commercial-UNIX oriented slant (they all but
> endorse VERITAS).
>
> The book only serves to further emphasize that there is no definitive FMS
> (Fail over Management Software) solution for Open Source UNIX-like OSs.
> No true platform-independent (well, Linux-HA[.org]) project that
> integrates with monitoring, databases, web servers, load balancers, RAID
> / SAN controller, etc.
>
> The projects are there (PostgreSQL, Slony, PGPool, Nagios, Net-SNMP,
> FreeVRRPd, FreeBSD, GNU/Linux, Linux-HA, etc..), there just no integration
> yet.
>
> ~BAS
> ____________________
____________________
_______
> Slony1-general mailing list
> Slony1-general- AuKwsB3Fm+ugFIWk8tvy
RWD2FQJk+8+b@public.gmane.org
> http://gborg.postgresql.org/mailman.../slony1-general
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby-D/ iDPWeZeLdl57MIdRCFDg
@public.gmane.org
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Christopher Browne

2006-01-20, 8:24 pm

Jim C. Nasby wrote:

><dons Nomex undies>
>Well, I would generally have to agree on not using Slony 1 for HA. I
>don't see how it could be considered acceptable to potentially lose
>committed transactions when the master fails. Unless maybe my
>understanding of Slony is flawed...
>
>

Well, that presumably depends on perspective.

A bank generally cannot ever afford to lose ANY transactions, which
would tend to mean that only synchronous replication would be any kind
of answer.

That kind of application points to really forcibly needing 2PC...

Maximizing availability, which is what HA is forcibly and unambiguously
about, may not be exactly the same thing as providing guarantees that
committed transactions can never be lost.
Andrew Sullivan

2006-01-20, 8:24 pm

On Fri, Jan 20, 2006 at 04:21:15PM -0500, Christopher Browne wrote:
> Maximizing availability, which is what HA is forcibly and unambiguously
> about, may not be exactly the same thing as providing guarantees that
> committed transactions can never be lost.


Right. And even banks are forced to make some compromises here. For
instance, nobody can do 2PC or any synchronous transaction
replication across WANs. So a perfect, up to the millisecond version
of the bank can't be online somewhere else. In a system I'm familiar
with, the transaction log is 2PCd somewhere else at transaction time,
but not live data. If the remote site had to come into use, you'd
have a few minutes of recovery time while you replayed and caught up.

And remember, this is assuming total destruction of the primary
system -- all the disks and everything. If it matters slightly less
what order exactly transactions happen in, then you're ok. So the
mitigation trick here is to hold transactions above a certain dollar
value under certain very unlikely circumstances. Banks have all
sorts of provisions for this kind of thing; it's also why they hire
scores of risk-mitigation people.

But would I use Slony as the _only_ wheel in my HA machine? Not on a
bet.

A

--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
The whole tendency of modern prose is away from concreteness.
--George Orwell
Jim C. Nasby

2006-01-20, 8:24 pm

On Fri, Jan 20, 2006 at 04:47:01PM -0500, Andrew Sullivan wrote:
> On Fri, Jan 20, 2006 at 04:21:15PM -0500, Christopher Browne wrote:
>
> Right. And even banks are forced to make some compromises here. For
> instance, nobody can do 2PC or any synchronous transaction
> replication across WANs. So a perfect, up to the millisecond version
> of the bank can't be online somewhere else. In a system I'm familiar
> with, the transaction log is 2PCd somewhere else at transaction time,
> but not live data. If the remote site had to come into use, you'd
> have a few minutes of recovery time while you replayed and caught up.


Sounds perfectly reasonable. Not being able to do credit-card auth for 5
minutes will piss a bunch of people off, but losing actual data would be
*really* bad.

It would be very, very cool if something like this was available for
PostgreSQL. I suspect it's probably doable with 8.1, but unfortunately
I'm not well versed enough in this stuff to know. But being able to show
folks how they could setup HA that was guaranteed not to lose committed
data... that would be a huge boost for the community. I'm pretty sure
that every single sales call I've been on has brought this kind of thing
up.

> And remember, this is assuming total destruction of the primary
> system -- all the disks and everything. If it matters slightly less
> what order exactly transactions happen in, then you're ok. So the
> mitigation trick here is to hold transactions above a certain dollar
> value under certain very unlikely circumstances. Banks have all
> sorts of provisions for this kind of thing; it's also why they hire
> scores of risk-mitigation people.
>
> But would I use Slony as the _only_ wheel in my HA machine? Not on a
> bet.


Yeah, it would be damn nice if there was a stronger alternative. From
what I've read I think Slony-II might fit the bill (though I can't
remember if there's a guarantee that a changeset will exist at least
somewhere else before COMMIT returns), but I suspect it wouldn't perform
well over a WAN.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby-D/ iDPWeZeLdl57MIdRCFDg
@public.gmane.org
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jan Wieck

2006-01-20, 8:24 pm

On 1/20/2006 6:26 PM, Jim C. Nasby wrote:
> On Fri, Jan 20, 2006 at 04:47:01PM -0500, Andrew Sullivan wrote:
>
> Sounds perfectly reasonable. Not being able to do credit-card auth for 5
> minutes will piss a bunch of people off, but losing actual data would be
> *really* bad.


You might be mistaken in this point. Banks are like insurance companies.
The volume of financial transactions they process allows to evaluate the
cost to "secure" something vs. the possible damage if something is lost.

Even 20 years ago the bank I was working for in Germany didn't bother to
compare the signatures on checks cashed at the counter if the amount was
under 1000 DM (about $400 US at that time). You could literally sign
your check with "Mickey Mouse" and go to any of the 250 locations and
you'd get cash for that check. The few cases where the bank had to
refund were far cheaper than checking every single signature.


Jan

--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Hannu Krosing

2006-01-21, 3:23 am

w5xoZWwga2VuYWwgcMOk
ZXZhbCwgUiwgMjAwNi0w
MS0yMCBrZWxsIDE2OjQ3
LCBraXJqdXRhcyBB
bmRyZXcgU3VsbGl2YW46
Cgo+IEJ1dCB3b3VsZCBJ
IHVzZSBTbG9ueSBhcyB0
aGUgX29ubHlfIHdo
ZWVsIGluIG15IEhBIG1h
Y2hpbmU/ ICBOb3Qgb24gYQo+IGJl
dC4KCldpc2UgY2hvaWNl
LCBhcyBj
dXJyZW50bHkgU2xvbnkg
KklTKiBicm9rZW4gYnkg
ZGVzaWduIChzZWUgbXkg
bGFzdCBtYWlsCm9u
IHRoZSBzdWJqZWN0KS4g
QnV0IHlvdSBvbmx5IG5v
dGljZSB0aGlzIGJyb2tl
bm5lc3MgaWYgeW91
IGRvIHJlYWxseSBhCmxv
dCBvZiB0cmFuc2FjdGlv
bnMsIHNvIHRoYXQgaW5k
ZXhlcyB1c2luZyB4
eGlkX29wcyBzdGFydCBm
YWlsaW5nLCB3aGljaApn
ZW5lcmFsbHkgY2FuIHBv
c3NpYmx5IGhhcHBl
biBhZnRlciAyRyB0cmFu
c2FjdGlvbnMsIGFuZCBl
dmVuIHRoZW4KaGFwcGVu
cyBxdWl0ZSBpbmZy
ZXF1ZW50bHkuIAoKLS0t
LS0tLS0tLS0tLS0KSGFu
bnUKCgpfX19fX19fX19f
X19fX19fX19fX19f
X19fX19fX19fX19fX19f
X19fX19fX19fXwpTbG9u
eTEtZ2VuZXJhbCBtYWls
aW5nIGxpc3QKU2xv
bnkxLWdlbmVyYWxAZ2Jv
cmcucG9zdGdyZXNxbC5v
cmcKaHR0cDovL2dib3Jn
LnBvc3RncmVzcWwu
b3JnL21haWxtYW4vbGlz
dGluZm8vc2xvbnkxLWdl
bmVyYWwK

Andrew Sullivan

2006-01-23, 7:24 am

On Sat, Jan 21, 2006 at 04:32:12AM +0200, Hannu Krosing wrote:
> =DChel kenal p=E4eval, R, 2006-01-20 kell 16:47, kirjutas Andrew Sullivan:
> =


> =


> Wise choice, as currently Slony *IS* broken by design (see my last mail
> on the subject). But you only notice this brokenness if you do really a
> lot of transactions, so that indexes using xxid_ops start failing, which


To be strict, that's not broken by design, that's a bug. Nobody
intended that there be such a limit on transactions.

A

-- =

Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
"The year's penultimate month" is not in truth a good way of saying
November.
--H.W. Fowler
Jim C. Nasby

2006-01-23, 11:25 am

On Fri, Jan 20, 2006 at 07:20:00PM -0500, Jan Wieck wrote:
> On 1/20/2006 6:26 PM, Jim C. Nasby wrote:
>
> You might be mistaken in this point. Banks are like insurance companies.
> The volume of financial transactions they process allows to evaluate the
> cost to "secure" something vs. the possible damage if something is lost.
>
> Even 20 years ago the bank I was working for in Germany didn't bother to
> compare the signatures on checks cashed at the counter if the amount was
> under 1000 DM (about $400 US at that time). You could literally sign
> your check with "Mickey Mouse" and go to any of the 250 locations and
> you'd get cash for that check. The few cases where the bank had to
> refund were far cheaper than checking every single signature.


There's a difference between someone getting spoofed when their
checkbook got stolen and suddenly losing a bunch of CC transactions.
Though, I wonder if CC machines ever double-check that transactions
they've already submitted actually made it into the system...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby-D/ iDPWeZeLdl57MIdRCFDg
@public.gmane.org
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Andrew Sullivan

2006-01-23, 1:24 pm

On Fri, Jan 20, 2006 at 05:26:56PM -0600, Jim C. Nasby wrote:
> I'm not well versed enough in this stuff to know. But being able to show
> folks how they could setup HA that was guaranteed not to lose committed
> data... that would be a huge boost for the community. I'm pretty sure


There's no system commercial available, AFAIK, that can offer that
guarantee. Here's what it would have to guarantee to make it
possible, in the usual N (so this is N+1) notation

1. The complete destruction of N machines cannot cause the loss
of any committed data.

2. The complete destruction of N's data centre cannot cause the
loss of any committed data.

3. The malicious compromise of N cannot cause the loss of any
committed data.

4. The accidental compromise of N cannot cause the loss of any
committed data.

5. Undetected bugs in code cannot cause the loss of any
committed data.

(2) is effectvely impossible, because even light has latency. For
most transactions, users will not wait for the latency of wide-area
COMMIT messages. (Banking isn't even an exception any more: online
trading systems would be incapable of the speeds they achieve -- and
the resulting occasional meltdowns they create -- if they had to do
2PC across the country, which is to say across power and
network-provider points.) And there's simply nothing you can do to
guarantee that 3-5 is impossible.

This is all about risk management. What you need to do is evaluate
how much your data is worth in the aggregate, how much any particular
transaction may possibly be worth, and then make sure that you don't
spend any more than that for the provision of the data. If you do
spend more, you're going to be bankrupt; the question isn't whether,
it's just how long it will take. Companies will happily _tell_ you
that their system offers these "guarantees", but it turns out that
when you do the real analysis, there simply isn't a way to provide
guarantees in the way people usually mean the word. What you get is
assurance and a greater or smaller assurance level.

Even people who claim to provide "five nines" usually can't really.
That's because, for the small probablity that the 99.99% uptime
happens when 99.999% doesn't, it's likely to be cheaper to pay the
uptime penalty than it is to provision for the extra "9". The same
thing is true with these "gurantees".

> Yeah, it would be damn nice if there was a stronger alternative. From
> what I've read I think Slony-II might fit the bill (though I can't
> remember if there's a guarantee that a changeset will exist at least
> somewhere else before COMMIT returns), but I suspect it wouldn't perform
> well over a WAN.


Well, the idea of slony-2 is that when a COMMIT returns, you are
guartanteed that all then-participating nodes have the data. The big
question is whether slony-2 is even possible, alas. and no, it
certainly won't work over a WAN.


a

--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin
Jan Wieck

2006-01-24, 11:24 am

On 1/23/2006 6:52 AM, Andrew Sullivan wrote:

> On Sat, Jan 21, 2006 at 04:32:12AM +0200, Hannu Krosing wrote:
n:[color=darkred]
[color=darkred]
[color=darkred]
> =


> To be strict, that's not broken by design, that's a bug. Nobody
> intended that there be such a limit on transactions.


It's a bug caused by an oversight in the technical implementation paired =

with a still not yet implemented feature (namely log switching, I am =

working on it for 1.2 now).


Jan

-- =

#=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3
D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=
=3D JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Alan Garrison

2006-01-26, 5:00 pm

Andrew Sullivan wrote:
> On Fri, Jan 20, 2006 at 05:26:56PM -0600, Jim C. Nasby wrote:


>
> Well, the idea of slony-2 is that when a COMMIT returns, you are
> guartanteed that all then-participating nodes have the data. The big
> question is whether slony-2 is even possible, alas. and no, it
> certainly won't work over a WAN.


Would the new "Two Phase Commit" stuff mentioned in the 8.1 release
notes obsolete slony-2? I haven't had a chance to play with it yet, but
isn't that what's already there?


--
Alan Garrison
Cronosys, LLC <http://www.cronosys.com>
Phone: 216-221-4600 ext 308
Jan Wieck

2006-01-26, 5:00 pm

On 1/26/2006 11:18 AM, Alan Garrison wrote:
> Andrew Sullivan wrote:
>
>
> Would the new "Two Phase Commit" stuff mentioned in the 8.1 release
> notes obsolete slony-2? I haven't had a chance to play with it yet, but
> isn't that what's already there?


Two Phase Commit is NOT a replication solution. It is another vehicle to
design a synchronous one, but someone still has to do that.

However, using 2PC to create a sync replication solution has some well
known problems. The biggest one being that if the origin of a
transaction becomes unavailable after you prepared for commit, you have
to keep the locks of that transaction indefinitely because there is no
way to figure out if the transaction was reported committed to the
application or not. This means that every row that got locked
exclusively in that transaction is indefinitely unavailable. One can say
"hey, I have 1,000,000 rows, so if I can't access 100 of them I still
have 99.999% availability", but I think that's not exactly what HA is
about ;-)


Jan

--
#===================
====================
====================
===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#===================
====================
=========== JanWieck- bwPqjjyvM7QAvxtiuMwx
3w@public.gmane.org #
Jim C. Nasby

2006-01-26, 8:23 pm

On Mon, Jan 23, 2006 at 12:59:55PM -0500, Andrew Sullivan wrote:
> On Fri, Jan 20, 2006 at 05:26:56PM -0600, Jim C. Nasby wrote:
>
> There's no system commercial available, AFAIK, that can offer that
> guarantee. Here's what it would have to guarantee to make it
> possible, in the usual N (so this is N+1) notation


What I was replying to in
http://gborg.postgresql.org/piperma...ary/003678.html
seems to indicate otherwise, but maybe I was reading it wrong...

What I do know is that every sales call I've been on someone at some
point has asked about PostgreSQL 'clustering' for HA, so there's
definately a lot of interest in HA out there (thought granted most of
these folks wouldn't spring for WAN HA...)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby-D/ iDPWeZeLdl57MIdRCFDg
@public.gmane.org
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Andrew Sullivan

2006-01-29, 8:24 pm

On Thu, Jan 26, 2006 at 05:53:41PM -0600, Jim C. Nasby wrote:
> On Mon, Jan 23, 2006 at 12:59:55PM -0500, Andrew Sullivan wrote:
[color=darkred]
> What I was replying to in
> http://gborg.postgresql.org/piperma...ary/003678.html
> seems to indicate otherwise, but maybe I was reading it wrong...


Some systems can replicate _parts_ of the data in real time. For
instance, they can ship logs of transactions; and in the case of the
bank, you're willing to wait anyway, if you're getting your cash. In
the case of, for instance, online trading, there's actually a
settlement period involved, so if the transaction were lost by
someone because of a data systems disaster, they'd just invoke the
_force majeure_ clause and walk away. There'd be some damages, but
that's what risk management actuaries are employed by banks to
mitigate. It's really a matter of trade-off.

> What I do know is that every sales call I've been on someone at some
> point has asked about PostgreSQL 'clustering' for HA, so there's
> definately a lot of interest in HA out there (thought granted most of
> these folks wouldn't spring for WAN HA...)


Sell them hardware failover. That's what IBM would sell you if you
were using DB2: two machines running (in an example I'm familiar
with) HACMP, and one can take over from the other in the case of
failure. You can do this with Linux and others too. If it's only HA
and not workload scale you want from a cluster, you can have it
today. IBM won't guarantee 5-nines on these arrangements, but I
understand that they _will_ offer such guarantees on Linux-on-390, so
you still have an (admittedly expensive) option if you want it badly
enough.

A

--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com