[torqueusers] Understanding & dealing with torque error codes

Chris Samuel csamuel at vpac.org
Sun Oct 31 15:50:02 MST 2004


On Sat, 30 Oct 2004 02:07 am, D.J.Baker at soton.ac.uk wrote:

> Finally would an upgrade help to eliminate errors 15001 and 15004? What is
> the theory and the experience of the community, please.

We used to have awful problems with parallel jobs if we had restarted a mom on 
a node, we'd have to restart the pbs_server and then do a pbsnodes and then 
wait some time for it to all resynchronise before it would be happy again.

Since p3 this problem has gone away for us (<python>and there was much 
rejoicing</python>) and the p4 snapshots have made a lot of progress in 
tracking down the bugs that caused mom's to occasionally die (hence why we 
were restarting them).

cheers!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041101/fd6e5b4b/attachment.bin


More information about the torqueusers mailing list