[torqueusers] Re: Kernel upgrade breaks torques idea of how many
cpus a node has
griznog at gmail.com
Mon Sep 1 07:33:36 MDT 2008
On Mon, Sep 1, 2008 at 8:42 AM, John Hanks <griznog at gmail.com> wrote:
> Any suggestions as to why the newer kernel causes this and how to fix
> it would be greatly appreciated.
Turns out not to be torque, at least I don't think it is. This showed
up in repeated runnings of checkjob for jobs waiting to start:
NOTE: job cannot run in partition Moab (idle procs do not meet
requirements : 0 of 8 procs found)
Turns out something about the new kernel causes a process started from
cron to hang so that on all new-kernel nodes there is a load of 1
because 1 cpu is servicing the hung process. I have enough cpus, but
not enough idle cpus. Killing this process solves the issue.
More information about the torqueusers