[torqueusers] pbs_mom endless kill loop

Kevin Murphy murphy at genome.chop.edu
Wed Oct 22 09:17:28 MDT 2008

On Oct 21, 2008, at 2:34 PM, George Wm Turner wrote:
> I'll add a "me too!"
> I've seen it with versions up to torque 2.3.4;  later version are  
> better; i.e. not as likely to tip over into this mode (2.3.3,  
> 2.3.4)  2.3.2 was very bad about getting into this state.
> I suspect with each iteration of the loop it opens another socket  
> back to the pbs_server; I quickly run out of privileged ports and  
> then NFS goes offline.

OK, just for the record, we've been having what I am now pretty sure  
is a NIC driver problem, which sometimes causes kernel panics on the  
head node.  After the head node is restarted (and not before), the  
moms invariably exhibit this problem.


More information about the torqueusers mailing list