[torqueusers] Job stuck with exec_host set but queued
csamuel at vpac.org
Tue Sep 13 20:24:48 MDT 2005
I've got a job that's queued and somehow managed to get itself into the state
where exec_host is set to a list of nodes even though it's still waiting to
run. It appears that it has attempted to start and failed and now has this
vestige left and I can't figure out how to remove it!
RM failure, rc: 15057, msg: 'Cannot execute at specified host because of
checkpoint or stagein files MSG=cannot assign hosts'
Not sure quite what that's trying to tell me, I suspect the last part of the
message is more accurate than the former as neither of the two statements are
Any clues ?
Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050914/69619d0c/attachment.bin
More information about the torqueusers