| Author |
Problems catching up?
|
|
| Ujwal S. Setlur 2006-01-16, 8:24 pm |
| Hi all,
So I have this master/slave setup with replication
going. One particular table is huge, ~50 million rows.
It took about 24 hours for the slave to subscribe and
catch up. But it did: st_num_lag_events came to 0.
Then the master got rebooted for some reason. Slony1
was not in the startup scripts so it did not get
launched for a day or so. I started it manually and
expected the slave to catch up eventually. It hasn't.
It has been about 4 days. st_num_lag_events is about
64000 and growing.
Any ideas?
Thanks,
Ujwal
____________________
____________________
__________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
| |
| Andrew Sullivan 2006-01-17, 7:24 am |
| On Mon, Jan 16, 2006 at 05:58:32PM -0800, Ujwal S. Setlur wrote:
> Then the master got rebooted for some reason. Slony1
> was not in the startup scripts so it did not get
> launched for a day or so. I started it manually and
> expected the slave to catch up eventually. It hasn't.
> It has been about 4 days. st_num_lag_events is about
> 64000 and growing.
Is it running? What do your logs say?
A
--
Andrew Sullivan | ajs-oaT0K0jot5/q2IAV+ODieA@public.gmane.org
It is above all style through which power defers to reason.
--J. Robert Oppenheimer
| |
| Ujwal S. Setlur 2006-01-17, 8:24 pm |
| Hi,
Does anyone have an idea about what might be going
wrong and how I might be able to fix it short of
dropping the slave node and re-subscribing?
Thanks,
Ujwal
--- "Ujwal S. Setlur" <uvsetlur-/ E1597aS9LQAvxtiuMwx3
w@public.gmane.org> wrote:
> Hi all,
>
> So I have this master/slave setup with replication
> going. One particular table is huge, ~50 million
> rows.
> It took about 24 hours for the slave to subscribe
> and
> catch up. But it did: st_num_lag_events came to 0.
>
> Then the master got rebooted for some reason. Slony1
> was not in the startup scripts so it did not get
> launched for a day or so. I started it manually and
> expected the slave to catch up eventually. It
> hasn't.
> It has been about 4 days. st_num_lag_events is about
> 64000 and growing.
>
> Any ideas?
>
> Thanks,
>
> Ujwal
>
> ____________________
____________________
__________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam
> protection around
> http://mail.yahoo.com
> ____________________
____________________
_______
> Slony1-general mailing list
> Slony1-general- AuKwsB3Fm+ugFIWk8tvy
RWD2FQJk+8+b@public.gmane.org
>
http://gborg.postgresql.org/mailman.../slony1-general
>
____________________
____________________
__________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
| |
| Vivek Khera 2006-01-17, 8:24 pm |
|
On Jan 16, 2006, at 8:58 PM, Ujwal S. Setlur wrote:
> launched for a day or so. I started it manually and
> expected the slave to catch up eventually. It hasn't.
> It has been about 4 days. st_num_lag_events is about
> 64000 and growing.
It might just be worth your while to punt and restart the replication
from scratch. I had to do that once when a replica died and wasn't
repaired over the weekend, causing an impossible to catch up situation.
| |
| Rod Taylor 2006-01-17, 8:24 pm |
| If it is still accomplishing something (no errors and st_last_received
is advancing) then you can pump the -g value way up (say 10000) and that
should start to make headway even on a very large sl_log_1 by doing a
ton of work in a single pass (very long and large transactions).
Best of luck!
On Mon, 2006-01-16 at 17:58 -0800, Ujwal S. Setlur wrote:
> Hi all,
>
> So I have this master/slave setup with replication
> going. One particular table is huge, ~50 million rows.
> It took about 24 hours for the slave to subscribe and
> catch up. But it did: st_num_lag_events came to 0.
>
> Then the master got rebooted for some reason. Slony1
> was not in the startup scripts so it did not get
> launched for a day or so. I started it manually and
> expected the slave to catch up eventually. It hasn't.
> It has been about 4 days. st_num_lag_events is about
> 64000 and growing.
>
> Any ideas?
>
> Thanks,
>
> Ujwal
>
> ____________________
____________________
__________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> ____________________
____________________
_______
> Slony1-general mailing list
> Slony1-general- AuKwsB3Fm+ugFIWk8tvy
RWD2FQJk+8+b@public.gmane.org
> http://gborg.postgresql.org/mailman.../slony1-general
>
--
| |
| Ujwal S. Setlur 2006-01-17, 8:24 pm |
| Thank you. Where do I change this "-g" value?
Ujwal
--- Rod Taylor <pg-pBEMHqKcNc4@public.gmane.org> wrote:
> If it is still accomplishing something (no errors
> and st_last_received
> is advancing) then you can pump the -g value way up
> (say 10000) and that
> should start to make headway even on a very large
> sl_log_1 by doing a
> ton of work in a single pass (very long and large
> transactions).
>
> Best of luck!
>
> On Mon, 2006-01-16 at 17:58 -0800, Ujwal S. Setlur
> wrote:
> rows.
> and
> Slony1
> and
> hasn't.
> about
> protection around
>
http://gborg.postgresql.org/mailman.../slony1-general
> --
>
>
____________________
____________________
__________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
| |
| Rod Taylor 2006-01-17, 8:24 pm |
| On Tue, 2006-01-17 at 14:41 -0800, Ujwal S. Setlur wrote:
> Thank you. Where do I change this "-g" value?
Check the command line arguments.
> Ujwal
>
> --- Rod Taylor <pg-pBEMHqKcNc4@public.gmane.org> wrote:
>
> http://gborg.postgresql.org/mailman.../slony1-general
>
>
> ____________________
____________________
__________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> ____________________
____________________
_______
> Slony1-general mailing list
> Slony1-general- AuKwsB3Fm+ugFIWk8tvy
RWD2FQJk+8+b@public.gmane.org
> http://gborg.postgresql.org/mailman.../slony1-general
>
--
| |
| Christopher Browne 2006-01-17, 8:24 pm |
| "Ujwal S. Setlur" <uvsetlur-/ E1597aS9LQAvxtiuMwx3
w@public.gmane.org> writes:
> Thank you. Where do I change this "-g" value?
You can change it on the command line.
We changed it in 1.1 to support big values (like 10000) specifically
for Rod's Peculiar Needs ;-).
--
(reverse (concatenate 'string "ofni.sailifa.ac" "@" "enworbbc"))
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
| |
| Rod Taylor 2006-01-17, 8:24 pm |
| On Tue, 2006-01-17 at 18:25 -0500, Christopher Browne wrote:
> "Ujwal S. Setlur" <uvsetlur-/ E1597aS9LQAvxtiuMwx3
w@public.gmane.org> writes:
>
> You can change it on the command line.
>
> We changed it in 1.1 to support big values (like 10000) specifically
> for Rod's Peculiar Needs ;-).
Not so much peculiar as the first to realize such a need existed.
One small step for a man, one giant leap for slon-kind?
--
| |
| Michael Crozier 2006-01-17, 8:24 pm |
|
> Not so much peculiar as the first to realize such a need existed.
>
> One small step for a man, one giant leap for slon-kind?
Not so sure... I still find need to compile slon to keep the sync size at a
constant 100 during the initial sync, otherwise it keeps dropping back to
zero and falling behind.
| |
| Rod Taylor 2006-01-18, 3:24 am |
| On Tue, 2006-01-17 at 16:26 -0800, Michael Crozier wrote:
>
> Not so sure... I still find need to compile slon to keep the sync size at a
> constant 100 during the initial sync, otherwise it keeps dropping back to
> zero and falling behind.
Have you tried this with newer versions? I recall discussions taking
place about automatically extending the grouping until it reaches a
second transaction.
The main reason for increasing group size during the initial sync was to
get out of that first large copy transaction caused by Slony itself.
It is normally the size, by time not necessarily real work, of the
transaction being replicated that causes the problems -- although there
are cases where having several sets can also make things difficult.
--
| |
| Michael Crozier 2006-01-18, 3:24 am |
|
> Have you tried this with newer versions? I recall discussions taking
> place about automatically extending the grouping until it reaches a
> second transaction.
>
> The main reason for increasing group size during the initial sync was to
> get out of that first large copy transaction caused by Slony itself.
>
>
> It is normally the size, by time not necessarily real work, of the
> transaction being replicated that causes the problems -- although there
> are cases where having several sets can also make things difficult.
As of RC2 I still had problems and resolved to rebuilding slon with
"group_size = 100" explicitly set somewhere in the loop. I must say,
however, I didn't not spend as much time with the options as I had
previously.
I'm initiating another initial sync tomorrow with a large database and
I pledge to spend a little more time on this issue.
regards,
michael
| |
| Christopher Browne 2006-01-18, 11:24 am |
| Michael Crozier <crozierm- 19LDBNnCZmbFzinHIz5S
+QC/G2K4zDHf@public.gmane.org> writes:
>
> As of RC2 I still had problems and resolved to rebuilding slon with
> "group_size = 100" explicitly set somewhere in the loop. I must say,
> however, I didn't not spend as much time with the options as I had
> previously.
>
> I'm initiating another initial sync tomorrow with a large database and
> I pledge to spend a little more time on this issue.
The "problem" you're running into is that the 'adaptive grouping'
starts with 1 SYNC at a time and progresses upwards.
You can shut off adaptive grouping by adding "-o 0" to your slon
options. That's probably easier than hacking on the code :-).
--
(format nil "~S@~S" "cbbrowne" "ca.afilias.info")
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
|
|
|
|