[torqueusers] pbsnodes bug?
garrick at usc.edu
Wed Nov 17 20:41:51 MST 2004
On Wed, Nov 17, 2004 at 02:16:14PM -0700, Jerry D. Smith II alleged:
> Good afternoon all,
> I am currently using torque-1.1.0p4 and Maui on a EM64T cluster with
> 512 nodes (dual proc)
> We noticed what we thought to be an odd behavior today with using
> multiple pbsnodes commands.
> We have launched a job, and are allocated a node "sn188"
> On first check with a pbsnodes -a the node is reported as
> While still leaving the job running on the node, we marked the node as
> "offline" with pbsnodes -o. Which is the status that we saw with a new
> pbsnodes -a. We then used pbsnodes -c to clear the node as no longer
> offline to the server. But when looking at it with pbsnodes -a, the
> resultant state was free, even though the job was still running on the
> node, and was still allocated when one looked via maui's checknode.
> Is this the expected behavior?
When you mark the node offline, it is overwriting the state. When you are
clearing the offline status, it goes back to it's default "free" state.
This isn't any cause for alarm. The scheduler won't be fooled into thinking it
can allocate a job on that node.
Note that "free" and "busy" don't refer to CPU availability or "I'm free for
jobs", it means the load average is or isn't over the max_load mom config.
This is why nodes can be both "job-exclusive" and "job-exclusive,busy".
Also note that I'm slightly over simplifying the above. When the loadave goes
over max_load, mom will report itself as busy. When the loadave then
drops down below ideal_load, mom will report itself as free.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041117/4546095c/attachment.bin
More information about the torqueusers