[torqueusers] Understanding & dealing with torque error codes
csamuel at vpac.org
Sun Oct 31 15:50:02 MST 2004
On Sat, 30 Oct 2004 02:07 am, D.J.Baker at soton.ac.uk wrote:
> Finally would an upgrade help to eliminate errors 15001 and 15004? What is
> the theory and the experience of the community, please.
We used to have awful problems with parallel jobs if we had restarted a mom on
a node, we'd have to restart the pbs_server and then do a pbsnodes and then
wait some time for it to all resynchronise before it would be happy again.
Since p3 this problem has gone away for us (<python>and there was much
rejoicing</python>) and the p4 snapshots have made a lot of progress in
tracking down the bugs that caused mom's to occasionally die (hence why we
were restarting them).
Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041101/fd6e5b4b/attachment.bin
More information about the torqueusers