[torqueusers] Bad file descriptor (9) in req_jobscript, job in unexpected state

Davide Salomoni Davide.Salomoni at nikhef.nl
Wed Dec 1 02:22:36 MST 2004


after the upgrade to torque 1.1.0p4, some of the nodes of my farm generate
the following messages:

12/01/2004 10:09:27;0001;   pbs_mom;Svr;pbs_mom;Bad file descriptor (9) in
req_jobscript, job in unexpected state
12/01/2004 10:09:27;0080;   pbs_mom;Req;req_reject;Reject reply
code=15004(Invalid request), aux=0, type=3, from PBS_Server at tbn18.nikhef.nl

Why am I getting these messages?

Apparently, the MOM process on those nodes does not work anymore. I tried
first of all to cycle the MOM using the new momctl command from the server,
as in

[root at tbn18 root]# ./momctl -C -h node15-9.farmnet.nikhef.nl
mom node15-9.farmnet.nikhef.nl successfully cycled cycle forced

which results in the following message on the node:

12/01/2004 10:14:23;0002;   pbs_mom;n/a;rm_request;reporting cycle forced

but this does not solve the problem. I thought momctl would trigger a full
mom restart, and it doesn't. Is that right?

But if I manually restart MOM *on the node*, as in 

[root at node15-9 root]# service pbs restart

the problem is gone. Could you help me understanding what's going on?


