[torqueusers] torque-2.4.1b1: Svr; PBS_Server; LOG_ERROR::File exists (17) in req_jobscript, Unable to open script file

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Sat Nov 7 12:15:59 MST 2009


Hi,
  time to time my cluster does not execute jobs. This is what I see in server_logs file:

11/07/2009 21:25:32;0004;PBS_Server;Svr;is_request;message STATUS (4) received from mom on host node016 (192.168.10.16:15003) (stream 15)
11/07/2009 21:25:32;0004;PBS_Server;Svr;is_request;IS_STATUS received from node016
11/07/2009 21:25:32;0040;PBS_Server;Req;is_stat_get;received status from node node016
11/07/2009 21:25:32;0040;PBS_Server;Req;update_node_state;adjusting state for node node016 - state=0, newstate=0
11/07/2009 21:25:38;0008;PBS_Server;Job;dispatch_request;dispatching request AuthenticateUser on sd=11
11/07/2009 21:25:38;0008;PBS_Server;Job;reply_send;Reply sent for request type AuthenticateUser on socket 11
11/07/2009 21:25:38;0008;PBS_Server;Job;dispatch_request;dispatching request QueueJob on sd=10
11/07/2009 21:25:38;0008;PBS_Server;Job;reply_send;Reply sent for request type QueueJob on socket 10
11/07/2009 21:25:38;0008;PBS_Server;Job;dispatch_request;dispatching request JobScript on sd=10
11/07/2009 21:25:38;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::File exists (17) in req_jobscript, Unable to open script file
11/07/2009 21:25:38;0008;PBS_Server;Job;reply_send;Reply sent for request type JobScript on socket 10
11/07/2009 21:25:38;0080;PBS_Server;Job;5573.nfssrv.cluster.local;removed job script
11/07/2009 21:25:38;0080;PBS_Server;Job;5573.nfssrv.cluster.local;removed job file
11/07/2009 21:25:41;0008;PBS_Server;Job;dispatch_request;dispatching request AuthenticateUser on sd=11
11/07/2009 21:25:41;0008;PBS_Server;Job;reply_send;Reply sent for request type AuthenticateUser on socket 11
11/07/2009 21:25:41;0008;PBS_Server;Job;dispatch_request;dispatching request QueueJob on sd=10
11/07/2009 21:25:41;0008;PBS_Server;Job;reply_send;Reply sent for request type QueueJob on socket 10
11/07/2009 21:25:41;0008;PBS_Server;Job;dispatch_request;dispatching request JobScript on sd=10
11/07/2009 21:25:41;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::File exists (17) in req_jobscript, Unable to open script file
11/07/2009 21:25:41;0008;PBS_Server;Job;reply_send;Reply sent for request type JobScript on socket 10
11/07/2009 21:25:41;0080;PBS_Server;Job;5574.nfssrv.cluster.local;removed job script
11/07/2009 21:25:41;0080;PBS_Server;Job;5574.nfssrv.cluster.local;removed job file
11/07/2009 21:25:44;0008;PBS_Server;Job;dispatch_request;dispatching request AuthenticateUser on sd=11
11/07/2009 21:25:44;0008;PBS_Server;Job;reply_send;Reply sent for request type AuthenticateUser on socket 11
11/07/2009 21:25:44;0008;PBS_Server;Job;dispatch_request;dispatching request QueueJob on sd=10
11/07/2009 21:25:44;0008;PBS_Server;Job;reply_send;Reply sent for request type QueueJob on socket 10
11/07/2009 21:25:44;0008;PBS_Server;Job;dispatch_request;dispatching request JobScript on sd=10
11/07/2009 21:25:44;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::File exists (17) in req_jobscript, Unable to open script file
11/07/2009 21:25:44;0008;PBS_Server;Job;reply_send;Reply sent for request type JobScript on socket 10
11/07/2009 21:25:44;0080;PBS_Server;Job;5575.nfssrv.cluster.local;removed job script
11/07/2009 21:25:44;0080;PBS_Server;Job;5575.nfssrv.cluster.local;removed job file
11/07/2009 21:25:54;0008;PBS_Server;Job;dispatch_request;dispatching request StatusServer on sd=10
11/07/2009 21:25:54;0008;PBS_Server;Job;reply_send;Reply sent for request type StatusServer on socket 10
11/07/2009 21:25:54;0008;PBS_Server;Job;dispatch_request;dispatching request StatusNode on sd=10
11/07/2009 21:25:54;0040;PBS_Server;Req;req_stat_node;entered

Any clues?
Thanks.
Martin


More information about the torqueusers mailing list