[torqueusers] PBS unable to execute Job.
Ashley Wright
a2.wright at qut.edu.au
Tue Sep 6 17:13:39 MDT 2005
Hi,
Today I have tried to execute qsub with torque, and it has failed. It
was working on Monday and I don't think any configuration has changed
since then. ssh is working fine, and /home is nfs mounted.
Can anybody help me? Below is the output from various log files.
When I run 'qsub time' (time is my script) I get a mail back which says:
PBS Job Id: 889.auriga.qut.edu.au
Job Name: time
Aborted by PBS Server
Job cannot be executed
See Administrator for help
I get the following lines is server_log:
09/07/2005 08:59:48;0100;PBS_Server;Job;889.auriga.qut.edu.au;enqueuing
into gen_30min, state 1 hop 1
09/07/2005 08:59:48;0008;PBS_Server;Job;889.auriga.qut.edu.au;Job Queued
at request of wright4 at auriga.qut.edu.au, owner =
wright4 at auriga.qut.edu.au, job name = time, queue = gen_30min
09/07/2005 08:59:48;0040;PBS_Server;Svr;auriga.qut.edu.au;Scheduler sent
command new
09/07/2005 08:59:49;0100;PBS_Server;Req;;Type StatusNode request
received from root at auriga.qut.edu.au, sock=10
09/07/2005 08:59:49;0100;PBS_Server;Req;;Type StatusQueue request
received from root at auriga.qut.edu.au, sock=10
09/07/2005 08:59:49;0100;PBS_Server;Req;;Type StatusJob request received
from root at auriga.qut.edu.au, sock=10
09/07/2005 08:59:49;0100;PBS_Server;Req;;Type ModifyJob request received
from root at auriga.qut.edu.au, sock=10
09/07/2005 08:59:49;0008;PBS_Server;Job;889.auriga.qut.edu.au;Job
Modified at request of root at auriga.qut.edu.au
09/07/2005 08:59:49;0100;PBS_Server;Req;;Type RunJob request received
from root at auriga.qut.edu.au, sock=10
09/07/2005 08:59:49;0008;PBS_Server;Job;889.auriga.qut.edu.au;Job Run at
request of root at auriga.qut.edu.au
09/07/2005 08:59:49;0100;PBS_Server;Req;;Type JobObituary request
received from pbs_mom at node010, sock=13
09/07/2005
08:59:49;0010;PBS_Server;Job;889.auriga.qut.edu.au;Exit_status=-1
resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb resources_used.walltime=00:00:00
09/07/2005 08:59:49;0100;PBS_Server;Job;889.auriga.qut.edu.au;dequeuing
from gen_30min, state EXITING
I get the following line in mom_log on node010:
09/07/2005 08:59:49;0001; pbs_mom;Job;TMomFinalizeJob3;job not
started, Failure job exec failure, before files staged, no retry
09/07/2005 08:59:49;0001; pbs_mom;Job;889.auriga.qut.edu.au;ALERT:
job failed phase 3 start, server will retry
Thanks,
Ashley
--
Ashley Wright
3864 9264
a2.wright at qut.edu.au
HPC and Research Support Group
Queensland University of Technology (QUT)
More information about the torqueusers
mailing list