[torqueusers] Bad file descriptor (9) in req_jobscript,
job in unexpected state
Davide.Salomoni at nikhef.nl
Wed Dec 1 02:22:36 MST 2004
after the upgrade to torque 1.1.0p4, some of the nodes of my farm generate
the following messages:
12/01/2004 10:09:27;0001; pbs_mom;Svr;pbs_mom;Bad file descriptor (9) in
req_jobscript, job in unexpected state
12/01/2004 10:09:27;0080; pbs_mom;Req;req_reject;Reject reply
code=15004(Invalid request), aux=0, type=3, from PBS_Server at tbn18.nikhef.nl
Why am I getting these messages?
Apparently, the MOM process on those nodes does not work anymore. I tried
first of all to cycle the MOM using the new momctl command from the server,
[root at tbn18 root]# ./momctl -C -h node15-9.farmnet.nikhef.nl
mom node15-9.farmnet.nikhef.nl successfully cycled cycle forced
which results in the following message on the node:
12/01/2004 10:14:23;0002; pbs_mom;n/a;rm_request;reporting cycle forced
but this does not solve the problem. I thought momctl would trigger a full
mom restart, and it doesn't. Is that right?
But if I manually restart MOM *on the node*, as in
[root at node15-9 root]# service pbs restart
the problem is gone. Could you help me understanding what's going on?
More information about the torqueusers