[torqueusers] send_sister sister not ok

Stijn De Weirdt stijn.deweirdt at ugent.be
Wed Dec 19 06:47:05 MST 2012


hi all,

we are seeing following errors with launching a 512 node job

anyone any idea how to debug? is this a timeout issue? nodes seem fine 
and have been running jobs before

stijn

12/19/2012 13:55:16;0001; 
pbs_mom.3471;Job;194.master-moab;send_sisters:  sister #178 
(node1181.muk.os) is not ok (15001)
12/19/2012 13:55:16;0001; 
pbs_mom.3471;Job;194.master-moab;send_sisters:  sister #222 
(node1225.muk.os) is not ok (15001)
12/19/2012 13:55:16;0001; 
pbs_mom.3471;Job;194.master-moab;send_sisters:  sister #287 
(node1291.muk.os) is not ok (15001)
12/19/2012 13:55:16;0001; 
pbs_mom.3471;Job;194.master-moab;send_sisters:  sister #455 
(node1468.muk.os) is not ok (15001)
12/19/2012 13:55:16;0001; 
pbs_mom.3471;Job;194.master-moab;send_sisters:  sister #496 
(node1509.muk.os) is not ok (15001)
12/19/2012 13:55:19;0001; 
pbs_mom.3471;Svr;pbs_mom;LOG_ERROR::im_request, Response recieved from 
client 10.141.129.181:471 (15003) jobid 194.master\
-moab
12/19/2012 13:55:19;0001; 
pbs_mom.3471;Svr;pbs_mom;LOG_ERROR::im_request, Response recieved from 
client 10.141.129.225:575 (15003) jobid 194.master\
-moab
12/19/2012 13:55:25;0001; 
pbs_mom.3471;Svr;pbs_mom;LOG_ERROR::im_request, Response recieved from 
client 10.141.130.35:603 (15003) jobid 194.master-\
moab
12/19/2012 13:55:25;0001; 
pbs_mom.3471;Svr;pbs_mom;LOG_ERROR::im_request, Response recieved from 
client 10.141.130.212:743 (15003) jobid 194.master\
-moab
12/19/2012 13:55:37;0001; 
pbs_mom.3471;Svr;pbs_mom;LOG_ERROR::im_request, Response recieved from 
client 10.141.130.253:414 (15003) jobid 194.master\
-moab


More information about the torqueusers mailing list