[torquedev] RMFailure: cannot send the job to pbs_mom

jayavant patil jayavant.patil82 at gmail.com
Thu Dec 9 22:40:20 MST 2010


   I have a cluster of 8 nodes in which 2 nodes are standalone machines and
other 6 are virtual machines on these 2 standalone machines.The problem is
pbs_server is unable to send the jobs to these 6 virtual machines.

   * momctl -h <virtual-machine_name> -d3* command shows error -query[0]
input/output error when i tried to diagnose the pbs_moms form pbs_server
node.In pbs_server log file it is showing that RMFailure:execution server
rejected request; can't send the job. In pbs_mom log file it is showing
connection to pbs_server timeout.But pbsnodes -a is correctly reporting the
states of the pbs_mom nodes.

    How to resolve this problem ?

Jayavant N. Patil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20101210/960bda50/attachment.html 

More information about the torquedev mailing list