[torqueusers] Newly online nodes / queued jobs
Garrick Staples
garrick at clusterresources.com
Thu Jan 4 14:39:11 MST 2007
On Thu, Jan 04, 2007 at 11:42:32AM -0600, Chris Evert alleged:
> Torque Users,
>
> I fixed a node and put it online. There were some 200 jobs in the Q
> state, but none are going onto the node.
>
> The situation didn't change after 5 minutes, which I believe is the
> sleep time for my queue server and scheduler.
>
> Thinking that jostling the job mix would relieve the logjam, I submitted
> new jobs and they went right onto the newly available node. When those
> jobs finished, nothing old went on the node. One of the running jobs
> finished and one of the queued jobs took its place, but the now online
> node remains idle.
>
> qrun successfully started a couple of jobs on that node.
>
> I am using torque-2.1.6 and maui-3.2.6p14
>
> Why aren't jobs that are already chomping at the bit to run jumping onto
> newly onlined nodes? More importantly, how can I avoid this behavior
> (aside from not having jobs in the Q state :-)?
Does maui have those jobs held? What does 'checkjob' say about those
jobs?
More information about the torqueusers
mailing list