garrick at usc.edu
Tue Jan 10 12:00:40 MST 2006
On Fri, Jan 06, 2006 at 11:45:58AM -0500, nathaniel.x.woody at gsk.com alleged:
> I have a number of batch processes that I'm running with torque that all
> run the same exact process on different pieces of a large data file. The
> process creates a number of intermediate files and in the end produces a
> file to be staged-out. My problem is that as soon as more than one job is
> executing on a node, these files have the chance to stomp all over each
> other (ie Job 1 and job 2 are running on a node, Job 1 completes and
> out.txt is staged out and then deleted (which confuses job 2) because they
> all run in the same directory (the user's home directory).
If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
use all job variables in the stageout, like
"stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
> What I would like to do is to convince torque to run the job in a clean
> directory (for instance, ~/00001.somehose.com), so that I can keep the
> jobs seperate without having to jump through file-renaming hoops or making
> the job start creating directories, etc. Torque essentially does this for
> the standard out and standard error files (by naming them by job id), but
> I can't seem to figure out how to get the desired behavior. Looking
> through the archives, I found a reference to something similar to this
> related to a patch that caused mom to create a temporary directory.
> However, this was a patch for torque 1.0.1 or so, and it doesn't appear to
> have been incorporated at any point.
The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
The job script will still need to cd to $TMPDIR.
> I've also noticed the rootdir and initdir parameters that I can set, but I
> don't think those create a directory if one doesn't already exist.
Correct. qsub's -d is handy, but the directory must already exist.
Some of my users create unique jobnames and do something like this:
qsub -N $jobname -d $jobname
> Is there a facility for doing what I describe here, or am I going to have
> do all of the work in the job script?
It sounds like TORQUE does have some options for you. Let us know if
you need something else.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060110/5b41b244/attachment.bin
More information about the torqueusers