[torqueusers] Unknown Job Id Behavior

Joshua Bernstein jbernstein at penguincomputing.com
Thu Jun 12 15:03:34 MDT 2008

Glen Beane wrote:

>I think I can probably try that out tomorrow, but I would 
> really appreciate it if you could give this a test first.

Alright, I just grabbed the SVN tree from about an hour or so ago and 
gave this a go. At first it seems to do the right thing. When a node 
reboots, and after it comes up I see:

06/12/2008 13:01:23;0004;PBS_Server;Svr;WARNING;ALERT: unable to contact 
node n0
from batch, state EXITING
sent command term

The job then disappears from the server's qstat, but pbsnodes n0 still 
shows the job as being on that node. But the node suddenly gets marked 
as down and it reports:

06/12/2008 13:08:58;0002;   pbs_mom;Svr;im_eof;Premature end of message 
from addr
06/12/2008 13:09:14;0002;   pbs_mom;Svr;im_eof;Premature end of message 
from addr

Just let me know how I can help!

-Joshua Bernstein
Software Engineer
Penguin Computing

More information about the torqueusers mailing list