[torqueusers] Unable to start interactive job

Craig West cwest at astro.umass.edu
Tue Dec 19 13:07:03 MST 2006


Hello,

I've got Torque 2.1.6 running with MAUI 3.2.6p17 on Debian Etch

I have been trying to run an interactive job but getting the following 
error message:

 > qsub -I

qsub: waiting for job <job_id> to start
qsub: job <job_id> apparently deleted

I looked at the pbs_server logs:

PBS_Server;Req;;Type AuthenticateUser request received from 
user at pbsserver, sock=12
PBS_Server;Req;;Type QueueJob request received from user at pbsserver, sock=10
PBS_Server;Req;;Type ReadyToCommit request received from user at pbsserver, 
sock=10
PBS_Server;Req;;Type Commit request received from user at pbsserver, sock=10
PBS_Server;Job;<job_id>;enqueuing into route, state 1 hop 1
PBS_Server;Job;<job_id>;dequeuing from route, state QUEUED
PBS_Server;Job;<job_id>;enqueuing into athlon, state 1 hop 1
PBS_Server;Job;<job_id>;Job Queued at request of user at pbsserver, owner = 
user at pbsserver, job name = STDIN, queue = athlon
PBS_Server;Svr;pbsserver;Scheduler sent command new
PBS_Server;Req;;Type StatusNode request received from root at pbsserver, 
sock=9
PBS_Server;Req;;Type StatusQueue request received from root at pbsserver, 
sock=9
PBS_Server;Req;;Type StatusJob request received from root at pbsserver, sock=9
PBS_Server;Req;;Type ModifyJob request received from root at pbsserver, sock=9
PBS_Server;Job;<job_id>;Job Modified at request of root at pbsserver
PBS_Server;Req;;Type RunJob request received from root at pbsserver, sock=9
PBS_Server;Job;<job_id>;Job Run at request of root at pbsserver
PBS_Server;Req;;Type ModifyJob request received from root at pbsserver, sock=9
PBS_Server;Job;<job_id>;Job Modified at request of root at pbsserver
PBS_Server;Req;;Type JobObituary request received from pbs_mom at node1, 
sock=12
PBS_Server;Job;<job_id>;Exit_status=-1
PBS_Server;Job;<job_id>;dequeuing from athlon, state COMPLETE
PBS_Server;Svr;pbsserver;Scheduler sent command term



I have looked at the logs on the node that has been allocated and found 
the following in mom_logs:

pbs_mom;Job;TMomFinalizeJob3;job not started, Failure job exec failure, 
before files staged, no retry
pbs_mom;Req;send_sisters;sending ABORT to sisters
pbs_mom;Job;<job_id>;Job Modified at request of PBS_Server at pbsserver


After reading other comments I noticed that errors get put into the 
syslog as well, here is the output of that:

pbs_mom: Resource temporarily unavailable (11) in TMomFinalizeChild, 
cannot open qsub sock


I also get and email from the pbs server

PBS Job Id: <job_id>
Job Name:   STDIN
Aborted by PBS Server Job cannot be executed
See Administrator for help


Note: I have asked myself (the Administrator) for help, but that has not 
been a great means of success.


Does anyone know what the problem could be, or point me in a direction I 
can proceed?
I have no problems running scripted jobs - non interactively.
Perhaps I have just missed an option that needs to be enabled to allow 
the running of interactive jobs?


Craig...



More information about the torqueusers mailing list