[torqueusers] Job stuck with exec_host set but queued

Chris Samuel csamuel at vpac.org
Tue Sep 13 20:24:48 MDT 2005

Hi folks,

I've got a job that's queued and somehow managed to get itself into the state 
where exec_host is set to a list of nodes even though it's still waiting to 
run.  It appears that it has attempted to start and failed and now has this 
vestige left and I can't figure out how to remove it!

Checkjob says:

RM failure, rc: 15057, msg: 'Cannot execute at specified host because of 
checkpoint or stagein files MSG=cannot assign hosts'

Not sure quite what that's trying to tell me, I suspect the last part of the 
message is more accurate than the former as neither of the two statements are 

Any clues ?

 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050914/69619d0c/attachment.bin

More information about the torqueusers mailing list