[torqueusers] Unable to stage out directories

Per Lundqvist perl at nsc.liu.se
Mon Dec 5 09:23:45 MST 2005


Is it possible to stage out directories in torque? It seems that it should 
be since stage out uses 'scp -r' or 'cp -r' (see compile options and 
config file in the attached text file), but I'm unable to make this work.

E.g. this does not work for me:
   #!/bin/sh
   #PBS -l nodes=n129,walltime=00:10:00 -j oe
   #PBS -W stageout=/disk/local/dir at torn:/home/perl/stageout

   mkdir /disk/local/dir
   touch /disk/local/dir/tmp.$RANDOM
   ls -lR /disk/local

(where /disk/local is a local filesystem available only on the compute 
nodes and /home is shared using nfs. pbs_server runs on the system node 
torn.)

* stage out of regular files works (as do manual scp -r or cp -r)
* I get no error output when trying to stage out a directory, but the
   directory never gets copied either
* the pbs prologue script takes care of deleting all files on /disk/local,
   and the epilogue kills all running user processes
* disabling the $usecp option (see mom_priv/config in attachment) makes no
   difference

Output:
   [perl at tornado ~]$ qsub bin/pbs.stageout
   10035.torn

   [perl at tornado ~]$ cat pbs.stageout.o10035
   /disk/local:
   total 4
   drwxr-xr-x  2 perl nsc 4096 Dec  5 16:58 dir

   /disk/local/dir:
   total 0
   -rw-r--r--  1 perl nsc 0 Dec  5 16:58 tmp.26719

   [perl at tornado ~]$ ls -lR /home/perl/stageout/
   /home/perl/stageout/:
   total 0

thanks for any help,

-- 
Per Lundqvist

National Supercomputer Centre
Linköping University, Sweden

http://www.nsc.liu.se
-------------- next part --------------
Misc. info:
torque version:	torque_1.2.0p5
maui:		maui-3.2.6p14-snap.1129921819

[root at n129 ~]# cat /var/spool/PBS/mom_priv/config
$logevent 511
$loglevel 7
$prologalarm 120
$clienthost torn
$clienthost pbsserver
$clienthost n0
$usecp torn:/home /home
$usecp n0:/home /home
$usecp localhost:/home /home

torque configure options:
./configure  --prefix=/usr/pbs --enable-docs --disable-gui --set-server-home=/var/spool/PBS --enable-syslog --with-scp --disable-rpp --enable-server

job id in example is 10035:
[root at n129 ~]# grep 10035 /var/spool/PBS/mom_logs/20051205 
12/05/2005 16:58:17;0001;   pbs_mom;Job;job_nodes;job: 10035.torn numnodes=1 numvnod=1
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;evaluating limits for job
12/05/2005 16:58:17;0001;   pbs_mom;Job;10035.torn;phase 2 of job launch successfully completed
12/05/2005 16:58:17;0001;   pbs_mom;Job;10035.torn;saving task (TMomFinalizeJob3)
12/05/2005 16:58:17;0008;   pbs_mom;Job;task_save;saving task in /var/spool/PBS/mom_priv/jobs/10035.torn.TK/0000000001
12/05/2005 16:58:17;0001;   pbs_mom;Job;TMomFinalizeJob3;job 10035.torn started, pid = 30518
12/05/2005 16:58:17;0001;   pbs_mom;Job;10035.torn;job successfully started
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;job 10035.torn reported successful start on 1 node(s)
12/05/2005 16:58:17;0008;   pbs_mom;Job;scan_for_terminated;for job 10035.torn, task 1, pid=30518, exitcode=0
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;sending signal 9 to task
12/05/2005 16:58:17;0008;   pbs_mom;Job;task_save;saving task in /var/spool/PBS/mom_priv/jobs/10035.torn.TK/0000000001
12/05/2005 16:58:17;0080;   pbs_mom;Job;10035.torn;saving task in /var/spool/PBS/mom_priv/jobs/10035.torn.TK/0000000001
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;Terminated
12/05/2005 16:58:17;0008;   pbs_mom;Job;task_save;saving task in /var/spool/PBS/mom_priv/jobs/10035.torn.TK/0000000001
12/05/2005 16:58:17;0080;   pbs_mom;Job;10035.torn;local task termination detected.  killing job
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;kill_job
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;kill_job done
12/05/2005 16:58:17;0080;   pbs_mom;Job;10035.torn;performing job clean-up
12/05/2005 16:58:17;0002;   pbs_mom;n/a;mom_set_limits;mom_set_limits(10035.torn,alter) entered
12/05/2005 16:58:17;0008;   pbs_mom;Job;10035.torn;Job Modified at request of PBS_Server at n0
12/05/2005 16:58:17;0004;   pbs_mom;Fil;10035.torn;forking to user, uid: 1158  gid: 500  homedir: '/home/perl'
12/05/2005 16:58:19;0080;   pbs_mom;Job;mom_deljob;deleting job 10035.torn in state EXITED

[root at torn ~]# grep 10035 /var/spool/PBS/server_logs/20051205
12/05/2005 16:58:16;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from TRANSIT to QUEUED-QUEUED (1-10)
12/05/2005 16:58:16;0008;PBS_Server;Job;10035.torn;Job Queued at request of perl at tornado, owner = perl at tornado, job name = pbs.stageout, queue = workq
12/05/2005 16:58:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from QUEUED to QUEUED-QUEUED (1-10)
12/05/2005 16:58:17;0008;PBS_Server;Job;10035.torn;Job Modified at request of root at torn
12/05/2005 16:58:17;0040;PBS_Server;Req;set_nodes;allocating nodes for job 10035.torn with node expression 'n129'
12/05/2005 16:58:17;0040;PBS_Server;Req;set_nodes;allocated node n129/0 to job 10035.torn
12/05/2005 16:58:17;0040;PBS_Server;Req;set_nodes;job 10035.torn allocated 1 nodes (nodelist=n129/0)
12/05/2005 16:58:17;0008;PBS_Server;Job;10035.torn;Job Run at request of root at torn
12/05/2005 16:58:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from QUEUED to RUNNING-STARTING (4-40)
12/05/2005 16:58:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from RUNNING to RUNNING-SUSPEND (4-42)
12/05/2005 16:58:17;000d;PBS_Server;Job;10035.torn;sending 'b' mail for job 10035.torn to perl at tornado (---)
12/05/2005 16:58:17;0008;PBS_Server;Job;10035.torn;Job Modified at request of root at torn
12/05/2005 16:58:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from RUNNING to EXITING-STAGEOUT (5-50)
12/05/2005 16:58:17;000d;PBS_Server;Job;10035.torn;sending 'e' mail for job 10035.torn to perl at tornado (Exit_status=0
12/05/2005 16:58:17;0010;PBS_Server;Job;10035.torn;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
12/05/2005 16:58:17;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from EXITING to EXITING-STAGEDEL (5-51)
12/05/2005 16:58:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from EXITING to EXITING-EXITED (5-52)
12/05/2005 16:58:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 10035.torn state from EXITING to EXITING-ABORT (5-53)
12/05/2005 16:58:19;0040;PBS_Server;Req;free_nodes;freeing nodes for job 10035.torn
12/05/2005 16:58:19;0040;PBS_Server;Req;free_nodes;freeing node n129/0 for job 10035.torn


More information about the torqueusers mailing list