[torqueusers] Re: mom segfault in new diag code
csamuel at vpac.org
Sun Oct 31 16:04:41 MST 2004
On Mon, 1 Nov 2004 09:47 am, Garrick Staples wrote:
> And you can reliably not break jobs? If I went through the entire cluster
> and restarted every mom, I know I'll lose half the jobs.
I forgot to say that when we've had to do that because of losing a mom
somewhere and needing to restart everything to get it back in sync (before
the fix in p3) this hadn't been a problem for us.
Of course it may be that we are running jobs that are not so susceptible to
this, it's hard to know because our userbase is so diverse..
Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041101/afc197db/attachment.bin
More information about the torqueusers