[torqueusers] Unable to start interactive job
Craig West
cwest at astro.umass.edu
Tue Dec 19 13:07:03 MST 2006
Hello,
I've got Torque 2.1.6 running with MAUI 3.2.6p17 on Debian Etch
I have been trying to run an interactive job but getting the following
error message:
> qsub -I
qsub: waiting for job <job_id> to start
qsub: job <job_id> apparently deleted
I looked at the pbs_server logs:
PBS_Server;Req;;Type AuthenticateUser request received from
user at pbsserver, sock=12
PBS_Server;Req;;Type QueueJob request received from user at pbsserver, sock=10
PBS_Server;Req;;Type ReadyToCommit request received from user at pbsserver,
sock=10
PBS_Server;Req;;Type Commit request received from user at pbsserver, sock=10
PBS_Server;Job;<job_id>;enqueuing into route, state 1 hop 1
PBS_Server;Job;<job_id>;dequeuing from route, state QUEUED
PBS_Server;Job;<job_id>;enqueuing into athlon, state 1 hop 1
PBS_Server;Job;<job_id>;Job Queued at request of user at pbsserver, owner =
user at pbsserver, job name = STDIN, queue = athlon
PBS_Server;Svr;pbsserver;Scheduler sent command new
PBS_Server;Req;;Type StatusNode request received from root at pbsserver,
sock=9
PBS_Server;Req;;Type StatusQueue request received from root at pbsserver,
sock=9
PBS_Server;Req;;Type StatusJob request received from root at pbsserver, sock=9
PBS_Server;Req;;Type ModifyJob request received from root at pbsserver, sock=9
PBS_Server;Job;<job_id>;Job Modified at request of root at pbsserver
PBS_Server;Req;;Type RunJob request received from root at pbsserver, sock=9
PBS_Server;Job;<job_id>;Job Run at request of root at pbsserver
PBS_Server;Req;;Type ModifyJob request received from root at pbsserver, sock=9
PBS_Server;Job;<job_id>;Job Modified at request of root at pbsserver
PBS_Server;Req;;Type JobObituary request received from pbs_mom at node1,
sock=12
PBS_Server;Job;<job_id>;Exit_status=-1
PBS_Server;Job;<job_id>;dequeuing from athlon, state COMPLETE
PBS_Server;Svr;pbsserver;Scheduler sent command term
I have looked at the logs on the node that has been allocated and found
the following in mom_logs:
pbs_mom;Job;TMomFinalizeJob3;job not started, Failure job exec failure,
before files staged, no retry
pbs_mom;Req;send_sisters;sending ABORT to sisters
pbs_mom;Job;<job_id>;Job Modified at request of PBS_Server at pbsserver
After reading other comments I noticed that errors get put into the
syslog as well, here is the output of that:
pbs_mom: Resource temporarily unavailable (11) in TMomFinalizeChild,
cannot open qsub sock
I also get and email from the pbs server
PBS Job Id: <job_id>
Job Name: STDIN
Aborted by PBS Server Job cannot be executed
See Administrator for help
Note: I have asked myself (the Administrator) for help, but that has not
been a great means of success.
Does anyone know what the problem could be, or point me in a direction I
can proceed?
I have no problems running scripted jobs - non interactively.
Perhaps I have just missed an option that needs to be enabled to allow
the running of interactive jobs?
Craig...
More information about the torqueusers
mailing list