[torqueusers] Torque 1.1.0p6 failing with "Command not found"

Joshua Weage weage98 at yahoo.com
Wed Feb 16 09:49:26 MST 2005


I've recently upgraded to 1.1.0p6, which solved a lot of problem I was
experiencing with 1.1.0p2.  However, torque has stopped working
completely today.  I have restarted moms, server and scheduler and
still get the same error.

All of my torque jobs now fail with the following error in the stderr
file:

/var/spool/torque/mom_priv/jobs/477.warlns2.SC: Command not found.

where 477 is the job id.  I have tried forcing the job to run on
different nodes and I get the same error.  

mom logs show:

02/16/2005 11:43:00;0001;   pbs_mom;Job;TMomFinalizeJob3;job
477.warlns2.global.arup.com started, pid = 10635
02/16/2005 11:43:01;0080;  
pbs_mom;Job;477.warlns2.global.arup.com;scan_for_terminated: job
477.warlns2.global.arup.com task 1 terminated, sid 10635
02/16/2005 11:43:01;0008;  
pbs_mom;Job;477.warlns2.global.arup.com;Terminated

Server logs show:


02/16/2005 11:43:00;0100;PBS_Server;Req;;Type AuthenticateUser request
received from jweage at warpc4.clients.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type QueueJob request received
from jweage at warpc4.clients.global.arup.com, sock=9
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type ReadyToCommit request
received from jweage at warpc4.clients.global.arup.com, sock=9
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type Commit request received
from jweage at warpc4.clients.global.arup.com, sock=9
02/16/2005
11:43:00;0100;PBS_Server;Job;477.warlns2.global.arup.com;enqueuing into
normal, state 1 hop 1
02/16/2005 11:43:00;0008;PBS_Server;Job;477.warlns2.global.arup.com;Job
Queued at request of jweage at warpc4.clients.global.arup.com, ow
ner = jweage at warpc4.clients.global.arup.com, job name = test_job, queue
= normal
02/16/2005
11:43:00;0040;PBS_Server;Svr;warlns2.global.arup.com;Scheduler sent
command new
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type StatusServer request
received from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type StatusNode request
received from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type StatusQueue request
received from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type SelStat request received
from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type SelStat request received
from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type SelStat request received
from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type ResourceQuery request
received from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type ModifyJob request
received from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0008;PBS_Server;Job;477.warlns2.global.arup.com;Job
Modified at request of Scheduler at warlns2.global.arup.com
02/16/2005 11:43:00;0100;PBS_Server;Req;;Type RunJob request received
from Scheduler at warlns2.global.arup.com, sock=10
02/16/2005 11:43:00;0008;PBS_Server;Job;477.warlns2.global.arup.com;Job
Run at request of Scheduler at warlns2.global.arup.com
02/16/2005
11:43:00;0040;PBS_Server;Svr;warlns2.global.arup.com;Scheduler sent
command recyc
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type JobObituary request
received from pbs_mom at waramd15.clients.global.arup.com, sock=9
02/16/2005
11:43:01;0010;PBS_Server;Job;477.warlns2.global.arup.com;Exit_status=1
resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb resources_used.walltime=00:00:01
02/16/2005
11:43:01;0100;PBS_Server;Job;477.warlns2.global.arup.com;dequeuing from
normal, state EXITING
02/16/2005
11:43:01;0040;PBS_Server;Svr;warlns2.global.arup.com;Scheduler sent
command term
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type StatusServer request
received from Scheduler at warlns2.global.arup.com, sock=9
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type StatusNode request
received from Scheduler at warlns2.global.arup.com, sock=9
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type StatusQueue request
received from Scheduler at warlns2.global.arup.com, sock=9
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type SelStat request received
from Scheduler at warlns2.global.arup.com, sock=9
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type SelStat request received
from Scheduler at warlns2.global.arup.com, sock=9
02/16/2005 11:43:01;0100;PBS_Server;Req;;Type SelStat request received
from Scheduler at warlns2.global.arup.com, sock=9

Can anyone tell me what is going wrong here?

Thanks,

Josh


=====



		
__________________________________ 
Do you Yahoo!? 
All your favorites on one personal page – Try My Yahoo!
http://my.yahoo.com 


More information about the torqueusers mailing list