[torqueusers] Problems with 1.0.1p5 moving Job Output files

Randy Philipp randy at umbc.edu
Mon Jul 11 13:50:32 MDT 2005


I am having a problem with a Cluster using Torque 1.0.1.p5.  After I
rebooted the cluster, the output files are not getting copied out of the
/var/spool/pbs/spool directory.  I am getting the following logs from
pbs_mom:

07/11/2005 15:35:52;0100;   pbs_mom;Req;;Type queuejob request received
from PBS_Server at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:35:52;0100;   pbs_mom;Req;;Type jobscript request received
from PBS_Server at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:35:52;0100;   pbs_mom;Req;;Type readytocommit request
received from PBS_Server at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:35:53;0100;   pbs_mom;Req;;Type commit request received from
PBS_Server at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:35:53;0008;
pbs_mom;Job;2330.kali.cl.math.umbc.edu;Started, pid = 7596
07/11/2005 15:35:53;0100;   pbs_mom;Req;;Type statusjob request received
from PBS_Server at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:35:53;0008;   pbs_mom;Job;2330.kali.cl.math.umbc.edu;task
started, /bin/sh
07/11/2005 15:35:53;0008;   pbs_mom;Job;2330.kali.cl.math.umbc.edu;task
started, /bin/sh
07/11/2005 15:35:57;0080;   pbs_mom;Job;2330.kali.cl.math.umbc.edu;task 2
terminated
07/11/2005 15:35:57;0080;   pbs_mom;Job;2330.kali.cl.math.umbc.edu;task 3
terminated
07/11/2005 15:35:58;0080;   pbs_mom;Job;2330.kali.cl.math.umbc.edu;task 1
terminated
07/11/2005 15:35:58;0008;
pbs_mom;Job;2330.kali.cl.math.umbc.edu;Terminated
07/11/2005 15:35:58;0008;
pbs_mom;Job;2330.kali.cl.math.umbc.edu;kill_job
07/11/2005 15:35:58;0080;   pbs_mom;Job;2330.kali.cl.math.umbc.edu;Obit
sent
07/11/2005 15:35:58;0100;   pbs_mom;Req;;Type deletefiles request received
from PBS_Server at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:35:58;0100;   pbs_mom;Req;;Type deletejob request received
from PBS_Server at kali.cl.math.umbc.edu, sock=10

These are the log messages from the server:

07/11/2005
15:36:44;0100;PBS_Server;Job;2330.kali.cl.math.umbc.edu;enqueuing into
workq, state 1 hop 1
07/11/2005 15:36:44;0008;PBS_Server;Job;2330.kali.cl.math.umbc.edu;Job
Queued at request of randy at kali.cl.math.umbc.edu, owner =
randy at kali.cl.math.umbc.edu, job name = Compute_PI, queue = workq
07/11/2005 15:36:44;0040;PBS_Server;Svr;kali.cl.math.umbc.edu;Scheduler
sent command new
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type statusserver request
received from Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type disconnect request received
from Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type statusqueue request received
from Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type selstat request received
from Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type rescq request received from
Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type modifyjob request received
from Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0008;PBS_Server;Job;2330.kali.cl.math.umbc.edu;Job
Modified at request of Scheduler at kali.cl.math.umbc.edu
07/11/2005 15:36:44;0100;PBS_Server;Req;;Type runjob request received from
Scheduler at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:44;0008;PBS_Server;Job;2330.kali.cl.math.umbc.edu;Job Run
at request of Scheduler at kali.cl.math.umbc.edu
07/11/2005 15:36:44;0040;PBS_Server;Svr;kali.cl.math.umbc.edu;Scheduler
sent command recyc
07/11/2005 15:36:45;0100;PBS_Server;Req;;Type authenticateuser request
received from randy at node19.cl.math.umbc.edu, sock=11
07/11/2005 15:36:45;0100;PBS_Server;Req;;Type statusjob request received
from randy at node19.cl.math.umbc.edu, sock=10
07/11/2005 15:36:45;0100;PBS_Server;Req;;Type statusjob request received
from randy at node19.cl.math.umbc.edu, sock=10
07/11/2005 15:36:47;0100;PBS_Server;Req;;Type authenticateuser request
received from randy at kali.cl.math.umbc.edu, sock=11
07/11/2005 15:36:47;0100;PBS_Server;Req;;Type statusserver request
received from randy at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:36:47;0100;PBS_Server;Req;;Type statusjob request received
from randy at kali.cl.math.umbc.edu, sock=10
07/11/2005 15:36:50;0100;PBS_Server;Req;;Type movejobfile request received
from pbs_mom at node19.cl.math.umbc.edu, sock=10
07/11/2005
15:36:50;0010;PBS_Server;Job;2330.kali.cl.math.umbc.edu;Exit_status=0
resources_used.cput=00:00:24 resources_used.mem=34396kb
resources_used.vmem=18616kb resources_used.walltime=00:00:04
07/11/2005
15:36:50;0100;PBS_Server;Job;2330.kali.cl.math.umbc.edu;dequeuing from
workq, state 5

Any help would be appreciated.  Thanks in advance for any assistance you
can provide.

Randy


More information about the torqueusers mailing list