Home > Archive > SQL Anywhere database > April 2006 > How to: Disaster Recovery (2)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How to: Disaster Recovery (2)
Pavel Karady

2006-03-21, 3:26 am

Hello gurus,

sorry for giving this thread another shot... this discussion is extremely
important to me.

In a rep environment, I want to achieve that a copy of the consolidated DB
is maintained on a separate machine, and when the cons goes down by any
reason, the DB on separate machine takes control immediately (within the
very same second). Replication should not stop (it should just be redirected
to a new location).

Stephen has provided an answer, which I am thankful of:

"Stephen Rice" < srice_nospam@ianywhe
re.com> wrote in message
news:44171ca0$1@foru
ms-1-dub...
> By combining dbremote -u and moving your backup to your standby system
> you can achieve what you want. You will need to control when dbremote
> runs in the sequence to ensure there are no holes.


1. But as I am aware, that dbremote with -u switch processes only
transaction logs that have been backed up - I don't see how this fits into
the desired scenario. Could please anyone (including Stephen) describe this
more?

2. I've been suggested to maintain an absolute copy of the consolidated DB
using dbbackup utility - the result should be exact .db files within every
second in separate locations - is this even possible? If yes, is this
achievable any other way than replication? How does "dbbackup -l" fit into
this?

3. Any suggestions how to change the replication directory (it's file type)
automagically after consolidated DB krasch? Should there be an event which
is (not less automagically) triggered and issues a SET REMOTE FILE OPTION
PUBLIC.DIRECTORY statement?

Many, many thanks in early advance
Pavel


Stephen Rice

2006-03-21, 3:26 am

Hi Pavel

You're crossing the line into consulting :)

I've made some comments in-line but you probably need to setup some
experiments to see what works for you.

/steve
Pavel Karady wrote:
> Hello gurus,
>
> sorry for giving this thread another shot... this discussion is extremely
> important to me.
>
> In a rep environment, I want to achieve that a copy of the consolidated DB
> is maintained on a separate machine, and when the cons goes down by any
> reason, the DB on separate machine takes control immediately (within the
> very same second). Replication should not stop (it should just be redirected
> to a new location).


Ahh High availability - addressed in Jasper.

Anything prior to that will not achieve what you want (same second
control) and I'm not sure Jasper will either, mostly because I've not
reviewed the high availability stuff very closely yet.

By combining clustering technologies with some thoughtful coding you
should be able to achieve automatic fail over in most cases (not 100% -
e.g. if an asteroid hits the data center) I would expect this fail over
to in the order of a few minutes under normal circumstances.

What kind of business requirement lead you to this level of availability
combined with SQL Remote (which has extreme latency built in - hardly a
real-time technology)?

>
> Stephen has provided an answer, which I am thankful of:
>
> "Stephen Rice" < srice_nospam@ianywhe
re.com> wrote in message
> news:44171ca0$1@foru
ms-1-dub...
>
>
>
> 1. But as I am aware, that dbremote with -u switch processes only
> transaction logs that have been backed up - I don't see how this fits into
> the desired scenario. Could please anyone (including Stephen) describe this
> more?


Consider the following case. I have just run dbremote (without the -u)
and some of the messages generated have been processed by the remotes.
I then experience media failure and lose my transaction log and mirror
transaction log. I switch to my fail over system. What happens?

>
> 2. I've been suggested to maintain an absolute copy of the consolidated DB
> using dbbackup utility - the result should be exact .db files within every
> second in separate locations - is this even possible? If yes, is this
> achievable any other way than replication? How does "dbbackup -l" fit into
> this?
>


DBBackup -l gets you close because you have an copy of the transaction
log. There are a number of issues with this approach including not all
transactions may have been copied when the primary fails and network
issues attaching to the disk drive.


> 3. Any suggestions how to change the replication directory (it's file type)
> automagically after consolidated DB krasch? Should there be an event which
> is (not less automagically) triggered and issues a SET REMOTE FILE OPTION
> PUBLIC.DIRECTORY statement?


I'd keep a separate machine for my files share. That way I can either
continue to use it or start a new machine with that name. That way no
schema changes are required

>
> Many, many thanks in early advance
> Pavel
>
>


--

Stephen Rice
Technical Services Manager
iAnywhere (a Sybase Company)
email: srice@ianywhere.com
Pavel Karady

2006-03-21, 7:24 am

"Stephen Rice" < srice_nospam@ianywhe
re.com> wrote in message
news:441ee3a8$1@foru
ms-2-dub...
> Hi Pavel


Hi Stephen, thanks for your excellent comments so far. My comments are
inline.

> You're crossing the line into consulting :)


Unless I'll be crossing the line into *insulting*, I think it's okay :))

> By combining clustering technologies with some thoughtful coding you
> should be able to achieve automatic fail over in most cases (not 100% -
> e.g. if an asteroid hits the data center) I would expect this fail over
> to in the order of a few minutes under normal circumstances.


With the approach I've created (however, some parts of it still need to be
polished...), the failover will be performed and won't take more than a
minute or two even in the asteroid case :) I'll describe the plan later in
the text.

> What kind of business requirement lead you to this level of availability
> combined with SQL Remote (which has extreme latency built in - hardly a
> real-time technology)?


Client's request... should the consolidated db fail, no messages are
delivered - kids start to be hungry. They've requested a failover system on
the consolidated db which is a good thing I assume.

> Consider the following case. I have just run dbremote (without the -u)
> and some of the messages generated have been processed by the remotes. I
> then experience media failure and lose my transaction log and mirror
> transaction log. I switch to my fail over system. What happens?


Now I get it! Thanks. Here's tha question: Can dbremote -u work safely along
with dbbackup -l on the same transaction log? That would assure almost
immediate transaction processing by dbremote, since I assume dbbackup tries
to send the change *immediately*, marking the change as backed up (so it
allows the dbremote to process that change).

> DBBackup -l gets you close because you have an copy of the transaction
> log. There are a number of issues with this approach including not all
> transactions may have been copied when the primary fails and network
> issues attaching to the disk drive.


dbbackup -l is a very fine thing that I am currently researching... very,
very fine. One more comment of yours and we're on the plan.

> I'd keep a separate machine for my files share. That way I can either
> continue to use it or start a new machine with that name. That way no
> schema changes are required


I'd rather keep the replication directory change... if the separate machine
would go down, that would be a disaster even with the consolidated db
running.

Here's my plan:

PREREQUISITIES:
1. We will call the server, where the consolidated machine us running, (c).
2. We will call the server, where the backup is kept (the disaster recovery
server), (b).
3. On (b), a backup of cons db from (c) is present, with dbbackup -l running
permanently (the actual dbbackup command has been fired from (b) machine).
4. On (b), a database server is already running with a single, small
database even without a transaction log - the coordinator database.
5. The coordinator database has an event scheduled for every 20 seconds to
check if the dbbackup -l is still running (somehow -> using xp_cmdshell it
gather a list of running processes to a file and then reads the file and
searches for "dbbackup"...)

THE HURRICANE:
1. Suddenly, a hurricane comes accross server (c) and smites it off the
landscape
2a. The coordinator database notices that dbbackup -l is no longer running
3a. It automatically applies the live-copy log to the backup of database and
starts the db under the same database server
4a. It performs some settings-actions on the freshly started db (including
setting the new replication path, etc.)
5a. An e-mail is fired with warning description, many !s, hurricane
pictures, etc.

3a. In between, the remote databases slowly start to recognize that the
consolidated does not respond to them
3b. They stop their agents, switch their replication paths to the new
location and start their agents

Please keep in mind that this is a *very rough* plan. Point 5. in
Prerequisities and nearly every point in The Hurricane (except point 1.)
needs to be more specified, with 3a. being the most important (and most
hazy).

If you have any suggestions, please post the first two-three of them (from
the 1000 you actually might have).

Pavel


snelbert

2006-03-21, 7:24 am

A very interesting thread that I have pondered over for years now, keep it
going!

Snelbert

"Pavel Karady" <pavel_ns. ns_karady@ns_kogerus
a.com> wrote in message
news:441fd9bb@forums
-2-dub...
> "Stephen Rice" < srice_nospam@ianywhe
re.com> wrote in message
> news:441ee3a8$1@foru
ms-2-dub...
>
> Hi Stephen, thanks for your excellent comments so far. My comments are
> inline.
>
>
> Unless I'll be crossing the line into *insulting*, I think it's okay :))
>
>
> With the approach I've created (however, some parts of it still need to be
> polished...), the failover will be performed and won't take more than a
> minute or two even in the asteroid case :) I'll describe the plan later in
> the text.
>
>
> Client's request... should the consolidated db fail, no messages are
> delivered - kids start to be hungry. They've requested a failover system
> on the consolidated db which is a good thing I assume.
>
>
> Now I get it! Thanks. Here's tha question: Can dbremote -u work safely
> along with dbbackup -l on the same transaction log? That would assure
> almost immediate transaction processing by dbremote, since I assume
> dbbackup tries to send the change *immediately*, marking the change as
> backed up (so it allows the dbremote to process that change).
>
>
> dbbackup -l is a very fine thing that I am currently researching... very,
> very fine. One more comment of yours and we're on the plan.
>
>
> I'd rather keep the replication directory change... if the separate
> machine would go down, that would be a disaster even with the consolidated
> db running.
>
> Here's my plan:
>
> PREREQUISITIES:
> 1. We will call the server, where the consolidated machine us running,
> (c).
> 2. We will call the server, where the backup is kept (the disaster
> recovery server), (b).
> 3. On (b), a backup of cons db from (c) is present, with dbbackup -l
> running permanently (the actual dbbackup command has been fired from (b)
> machine).
> 4. On (b), a database server is already running with a single, small
> database even without a transaction log - the coordinator database.
> 5. The coordinator database has an event scheduled for every 20 seconds to
> check if the dbbackup -l is still running (somehow -> using xp_cmdshell it
> gather a list of running processes to a file and then reads the file and
> searches for "dbbackup"...)
>
> THE HURRICANE:
> 1. Suddenly, a hurricane comes accross server (c) and smites it off the
> landscape
> 2a. The coordinator database notices that dbbackup -l is no longer running
> 3a. It automatically applies the live-copy log to the backup of database
> and starts the db under the same database server
> 4a. It performs some settings-actions on the freshly started db (including
> setting the new replication path, etc.)
> 5a. An e-mail is fired with warning description, many !s, hurricane
> pictures, etc.
>
> 3a. In between, the remote databases slowly start to recognize that the
> consolidated does not respond to them
> 3b. They stop their agents, switch their replication paths to the new
> location and start their agents
>
> Please keep in mind that this is a *very rough* plan. Point 5. in
> Prerequisities and nearly every point in The Hurricane (except point 1.)
> needs to be more specified, with 3a. being the most important (and most
> hazy).
>
> If you have any suggestions, please post the first two-three of them (from
> the 1000 you actually might have).
>
> Pavel
>



Stephen Rice

2006-03-21, 8:24 pm

Hi Pavel

One more pass in-line then we probably will need to find a different way
to edit what is getting to be a very large message


Pavel Karady wrote:
> "Stephen Rice" < srice_nospam@ianywhe
re.com> wrote in message
> news:441ee3a8$1@foru
ms-2-dub...
>
>
>
> Hi Stephen, thanks for your excellent comments so far. My comments are
> inline.
>
>
>
>
> Unless I'll be crossing the line into *insulting*, I think it's okay :))
>

Ahh, but you don't hear the Consulting Manager that sits in the office
next to me :)

>
>
>
> With the approach I've created (however, some parts of it still need to be
> polished...), the failover will be performed and won't take more than a
> minute or two even in the asteroid case :) I'll describe the plan later in
> the text.
>
>
>
>
> Client's request... should the consolidated db fail, no messages are
> delivered - kids start to be hungry. They've requested a failover system on
> the consolidated db which is a good thing I assume.
>

Failover is definitely a good idea, all I'm concerned about is the
timing expectation. You can implement something that is 100%
recoverable under all circumstances but not a sub-second recovery.
>
>
>
> Now I get it! Thanks. Here's tha question: Can dbremote -u work safely along
> with dbbackup -l on the same transaction log? That would assure almost
> immediate transaction processing by dbremote, since I assume dbbackup tries
> to send the change *immediately*, marking the change as backed up (so it
> allows the dbremote to process that change).
>

Yes they will work fine together.
>
>
>
> dbbackup -l is a very fine thing that I am currently researching... very,
> very fine. One more comment of yours and we're on the plan.


Just remember we don't guarantee all writes have been copied
>
>
>
>
> I'd rather keep the replication directory change... if the separate machine
> would go down, that would be a disaster even with the consolidated db
> running.


While changing the addresses will work, it's bound to be ugly to
implement automatically on the remotes.

If your message directories are on a clustered disk then they would stay
up. I'm not up on the latest MS clustering but back in the dark ages
we used our DEC VAXCluster to accomplish a similar thing. I'm assuming
MS can do what DEC used to do given Cutler's hand in things.

Its not clear to me whether the backup is in the same data center. If
it is not then clustering may not be such a good solution :(

>
> Here's my plan:
>
> PREREQUISITIES:
> 1. We will call the server, where the consolidated machine us running, (c).
> 2. We will call the server, where the backup is kept (the disaster recovery
> server), (b).
> 3. On (b), a backup of cons db from (c) is present, with dbbackup -l running
> permanently (the actual dbbackup command has been fired from (b) machine).
> 4. On (b), a database server is already running with a single, small
> database even without a transaction log - the coordinator database.
> 5. The coordinator database has an event scheduled for every 20 seconds to
> check if the dbbackup -l is still running (somehow -> using xp_cmdshell it
> gather a list of running processes to a file and then reads the file and
> searches for "dbbackup"...)


or use a proxy table connected to c. If you can't read from the table
assume it's crashed

>
> THE HURRICANE:
> 1. Suddenly, a hurricane comes accross server (c) and smites it off the
> landscape
> 2a. The coordinator database notices that dbbackup -l is no longer running
> 3a. It automatically applies the live-copy log to the backup of database and
> starts the db under the same database server
> 4a. It performs some settings-actions on the freshly started db (including
> setting the new replication path, etc.)
> 5a. An e-mail is fired with warning description, many !s, hurricane
> pictures, etc.
>
> 3a. In between, the remote databases slowly start to recognize that the
> consolidated does not respond to them
> 3b. They stop their agents, switch their replication paths to the new
> location and start their agents
>
> Please keep in mind that this is a *very rough* plan. Point 5. in
> Prerequisities and nearly every point in The Hurricane (except point 1.)
> needs to be more specified, with 3a. being the most important (and most
> hazy).
>
> If you have any suggestions, please post the first two-three of them (from
> the 1000 you actually might have).
>

Actually I can't think of much. I think you understand the key issues:
1) Make sure that nothing gets confirmed as replicated unless a good
backup of the tlog exists (dbremote -u)
2) make sure your recovery server has a copy of the db and all of the
tlogs required for dbremote
3) make sure the recovery server has the most current tlog (dbbackup -l)
4) make sure you can still access the dbremote message in/out boxes



> Pavel
>
>




--
/steve

Stephen Rice
Technical Services Manager
iAnywhere (a Sybase Company)
email: srice@ianywhere.com
Pavel Karady

2006-04-06, 7:28 am

"Stephen Rice" < srice_nospam@ianywhe
re.com> wrote in message
news:442078ac$1@foru
ms-2-dub...
> Hi Pavel
>
> One more pass in-line then we probably will need to find a different way
> to edit what is getting to be a very large message


Hi Stephen, absolutely agreed - so I've done some polishing

> If your message directories are on a clustered disk then they would stay
> up. I'm not up on the latest MS clustering but back in the dark ages we
> used our DEC VAXCluster to accomplish a similar thing. I'm assuming MS
> can do what DEC used to do given Cutler's hand in things.
> Its not clear to me whether the backup is in the same data center. If it
> is not then clustering may not be such a good solution :(


Unfortunately, the backup is in a city far away from the city which the
consolidated DB is in. Fortunately (but looks like not fortunately enough),
they're in the same country.

> or use a proxy table connected to c. If you can't read from the table
> assume it's crashed


This is a fine suggestion.

Anyway, the request for hot failover has been declined by the client, they
say, manual intervention at disaster time is sufficient for them. So from my
point of view, this thread has been finished - I've still learned fine new
things from it, and that's what I am thankful for :)

Even, the failover is addressed in Jasper as you suggest - once it'll become
to be a common box on the shelves in stores, this will definitely be a
solved case.

Thanks again for the excellent approach

Pavel


Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com