[torqueusers] Execution server rejected request MSG=cannot send job to mom, state=TRNOUT

Roger Hollamby roger.hollamby at arup.com
Wed Oct 23 13:43:39 MDT 2013


We are running pbs_server and pbs_sched on a RHEL 6.4 server

pbs_server --version shows

Version: 4.5.0
Commit: 4774a2e0521932d11033b45c5d90574bdd6230bc


Some jobs are running fine but others sit in the queue with the message

Not Running - PBS Error: Execution server rejected request MSG=cannot send job to mom, state=TRNOUT

The job is only requesting 2 nodes with ppn=2 and some of the nodes that are already running have more than 2 slots free.

If I check the mom_log files then there are lots of messages saying

pbs_mom.10355;Svr;pbs_mom;LOG_ERROR::rm_request, unknown command 5

but these are also on the nodes that are currently running jobs.



Roger Hollamby
Associate | Advanced Technology + Research

Arup


____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131023/4e8916ae/attachment.html 


More information about the torqueusers mailing list