[torqueusers] From deferred to idle and back

Charles Johnson charles.johnson at accre.vanderbilt.edu
Fri Jul 10 12:38:33 MDT 2009


We have several multi-processor jobs that will not start. showq -b  
shows them as deferred; later showq -i will show them as idle; then  
they will be deferred and so forth. Checkjob -v shows messages similar  
to these:

Message[0] 9 nodes unavailable to start reserved job after 63 seconds  
(reserved node vmp089 is in state 'Running' - check node)
Message[1] 9 nodes unavailable to start reserved job after 63 seconds  
(reserved node vmp090 is in state 'Running' - check node)
Message[2] 9 nodes unavailable to start reserved job after 63 seconds  
(reserved node vmp066 is in state 'Running' - check node)
Message[3] 10 nodes unavailable to start reserved job after 63 seconds  
(reserved node vmp069 is in state 'Running' - check node)

I haven't found anything revealing in the log files, but I am not  
exactly sure what to look for. The identified nodes have jobs running  
on them, but there are free processors.

We use torque 2.3.6, and moab 5.3.2 (revision 12709)

I would appreciate any suggestions.

Cheers--

Charles
---
Charles Johnson
Advanced Computing Center for Research and Education
Vanderbilt University
Office: 615-343-2776
Cell: 615-478-8799






More information about the torqueusers mailing list