nathaniel.x.woody at GSK.COM
nathaniel.x.woody at GSK.COM
Wed Jan 11 14:05:25 MST 2006
Thanks for the reply, I went ahead and rebuilt on 2.0.0p5 (just to ensure
I knew what version I was using), and we tried some of the things you
suggested but struggled to get things to work. Below is a transcript of
what we tried with example qsub commands so that hopefully you might point
out where our mistake is!
> If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
> use all job variables in the stageout, like
> "stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
Using TORQUE 2.0.0p5, I tried the following submission command to see if I
could certify this behavior:
$ echo 'echo $PBS_JOBID > $PBS_JOBID.txt' | qsub -W
stageout='$PBS_JOBID.txt at headnode:/home/todd/$PBS_JOBID.txt'
The job failed and the system delivered the following e-mail notification:
PBS Job Id: 45480.headnode
Job Name: STDIN
An error has occurred processing your job, see below.
Post job file processing error; job 45480.headnode on host node1/0
Unable to copy file $PBS_JOBID.txt to todd at headnode:/home/todd/
>>> error from copy
$PBS_JOBID.txt: No such file or directory
>>> end error output
Which certainly makes it look like TORQUE didn't interpolate either
instance of the $PBS_JOBID environment variable in the stageout attribute
value. What did I do wrong here?
> The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
> is best.)
> The job script will still need to cd to $TMPDIR.
Again, using TORQUE 2.0.0p5, I couldn't certify the expected behavior:
i.e. that pbs_mom creates a transient temporary directory, stores a
reference to it in the environment variable $TMPDIR, and then exports
$TMPDIR to the prologue script for its usage on the compute node.
This is what I tried:
$ echo 'echo $TMPDIR' | qsub
$ ls -la STDIN.*
-rw------- 1 nxw18916 gsk_rd 0 Jan 11 2006
-rw------- 1 nxw18916 gsk_rd 1 Jan 11 2006
$ perl -e 'open F,"<STDIN.o45481";my $s=<F>;chomp $s;print
qq(single byte is \\n\n) if $s eq ""'
single byte is \n
So the shell seems to have interpolated $TMPDIR to the empty string; thus
the single-byte contents of STDIN.o45481 was the newline put out by echo.
(Incidentally an 'echo -n' resulted in a zero-width file, but I didn't
think of trying this for greater clarity until I had already copied the
transcript. My apologies.)
I believe that I understand how the end result is supposed to work: in the
prologue script, I 'cd' into the pbs_mom-created temporary directory
referenced by $TMPDIR to do my work and then pbs_mom will remove this
directory after the job completes. So why does $TMPDIR always evaluate to
a zero-width string? What's going on with this?
Thanks in advance for any assistance.
On Fri, Jan 06, 2006 at 11:45:58AM -0500, nathaniel.x.woody at gsk.com
> I have a number of batch processes that I'm running with torque that all
> run the same exact process on different pieces of a large data file. The
> process creates a number of intermediate files and in the end produces a
> file to be staged-out. My problem is that as soon as more than one job
> executing on a node, these files have the chance to stomp all over each
> other (ie Job 1 and job 2 are running on a node, Job 1 completes and
> out.txt is staged out and then deleted (which confuses job 2) because
> all run in the same directory (the user's home directory).
If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
use all job variables in the stageout, like
"stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
> What I would like to do is to convince torque to run the job in a clean
> directory (for instance, ~/00001.somehose.com), so that I can keep the
> jobs seperate without having to jump through file-renaming hoops or
> the job start creating directories, etc. Torque essentially does this
> the standard out and standard error files (by naming them by job id),
> I can't seem to figure out how to get the desired behavior. Looking
> through the archives, I found a reference to something similar to this
> related to a patch that caused mom to create a temporary directory.
> However, this was a patch for torque 1.0.1 or so, and it doesn't appear
> have been incorporated at any point.
The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
The job script will still need to cd to $TMPDIR.
> I've also noticed the rootdir and initdir parameters that I can set, but
> don't think those create a directory if one doesn't already exist.
Correct. qsub's -d is handy, but the directory must already exist.
Some of my users create unique jobnames and do something like this:
qsub -N $jobname -d $jobname
> Is there a facility for doing what I describe here, or am I going to
> do all of the work in the job script?
It sounds like TORQUE does have some options for you. Let us know if
you need something else.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
[attachment "attm1nmb.dat" deleted by Nathaniel X Woody/PharmRD/GSK]
torqueusers mailing list
torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers