[torqueusers] Problem in running Torque job on the slave node

Kashif Saleem saleemk at dcs.gla.ac.uk
Mon Aug 29 12:56:33 MDT 2005


 
Hi,
    I am facing problem in running simple job. I installed TORQUE on two machines labpc-17.nesc.gla.ac.uk(server) and labpc-18.nesc.gla.ac.uk(slave) by mounting the /home directory of the server(labpc-17.nesc.gla.ac.uk) to the slave(labpc-18.nesc.gla.ac.uk) /home directory as mentioned in the article http://www.linuxgazette.com/node/9480.

But now I am getting another error while submitted  simple jobs to torque as shown below :
[saleemk at labpc-17 saleemk]$ echo " how r u" | qsub

But when I checked the server log file as shown below:
more /usr/local/spool/pbs/*_logs/*
I am getting the following error every time I am submitting any job :




08/25/2005 18:24:46;0100;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;enqueuing into qsar, state 1 hop 1
08/25/2005 18:24:46;0008;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;Job Queued at request of saleemk at labpc-17.nesc.gla.ac.uk, owner = saleemk at labpc-17.nesc.gla.ac.uk, job name = STDIN, queue = qsar
08/25/2005 18:24:46;0040;PBS_Server;Svr;labpc-17.nesc.gla.ac.uk;Scheduler sent command new
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type StatusServer request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type StatusNode request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type StatusQueue request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type SelStat request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type ModifyJob request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0008;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;Job Modified at request of Scheduler at labpc-17.nesc.gla.ac.uk08/25/2005 18:24:46;0100;PBS_Server;Req;;Type RunJob request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0008;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;Job Run at request of Scheduler at labpc-17.nesc.gla.ac.uk
08/25/2005 18:24:46;0040;PBS_Server;Svr;labpc-17.nesc.gla.ac.uk;Scheduler sent command recyc
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type JobObituary request received from pbs_mom at labpc-18.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:46;0010;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;Exit_status=-2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
08/25/2005 18:24:46;000d;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;Post job file processing error; job 6.labpc-17.nesc.gla.ac.uk on host labpc-18.nesc.gla.ac.uk
08/25/2005 18:24:46;0100;PBS_Server;Job;6.labpc-17.nesc.gla.ac.uk;dequeuing from qsar, state EXITING
08/25/2005 18:24:46;0040;PBS_Server;Svr;labpc-17.nesc.gla.ac.uk;Scheduler sent command term
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type StatusServer request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=9
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type StatusNode request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=9
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type StatusQueue request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=9
08/25/2005 18:24:46;0100;PBS_Server;Req;;Type SelStat request received from Scheduler at labpc-17.nesc.gla.ac.uk, sock=9
08/25/2005 18:24:49;0100;PBS_Server;Req;;Type AuthenticateUser request received from saleemk at labpc-17.nesc.gla.ac.uk, sock=11
08/25/2005 18:24:49;0100;PBS_Server;Req;;Type StatusJob request received from saleemk at labpc-17.nesc.gla.ac.uk, sock=9




It looks to me that error is on labpc-18.nesc.gla.ac.uk as mentioned in the log file(Post job file processing error; job 6.labpc-17.nesc.gla.ac.uk on host labpc-18.nesc.gla.ac.uk) but I dont know how to fix this error and why this error occur.




I am just wondering whether you came across this problem before or you have any idea how to fix it.


Kind Regards
Kashif Saleem.







-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20050829/ac715aae/attachment.html


More information about the torqueusers mailing list