[torqueusers] Re: Kernel upgrade breaks torques idea of how many cpus a node has

John Hanks griznog at gmail.com
Mon Sep 1 07:33:36 MDT 2008


On Mon, Sep 1, 2008 at 8:42 AM, John Hanks <griznog at gmail.com> wrote:

> Any suggestions as to why the newer kernel causes this and how to fix
> it would be greatly appreciated.

Turns out not to be torque, at least I don't think it is. This showed
up in repeated runnings of checkjob for jobs waiting to start:

NOTE:  job cannot run in partition Moab (idle procs do not meet
requirements : 0 of 8 procs found)

Turns out something about the new kernel causes a process started from
cron to hang so that on all new-kernel nodes there is a load of 1
because 1 cpu is servicing the hung process. I have enough cpus, but
not enough idle cpus. Killing this process solves the issue.

jbh


More information about the torqueusers mailing list