garrick at usc.edu
Wed Jan 11 14:27:46 MST 2006
On Wed, Jan 11, 2006 at 04:05:25PM -0500, nathaniel.x.woody at GSK.COM alleged:
> Thanks for the reply, I went ahead and rebuilt on 2.0.0p5 (just to ensure
> I knew what version I was using), and we tried some of the things you
> suggested but struggled to get things to work. Below is a transcript of
> what we tried with example qsub commands so that hopefully you might point
> out where our mistake is!
> > If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
> > use all job variables in the stageout, like
> > "stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
> Using TORQUE 2.0.0p5, I tried the following submission command to see if I
> could certify this behavior:
> $ echo 'echo $PBS_JOBID > $PBS_JOBID.txt' | qsub -W
> stageout='$PBS_JOBID.txt at headnode:/home/todd/$PBS_JOBID.txt'
> The job failed and the system delivered the following e-mail notification:
> PBS Job Id: 45480.headnode
> Job Name: STDIN
> An error has occurred processing your job, see below.
> Post job file processing error; job 45480.headnode on host node1/0
> Unable to copy file $PBS_JOBID.txt to todd at headnode:/home/todd/
> >>> error from copy
> $PBS_JOBID.txt: No such file or directory
> >>> end error output
Did you build torque with --enable-wordexp?
> Which certainly makes it look like TORQUE didn't interpolate either
> instance of the $PBS_JOBID environment variable in the stageout attribute
> value. What did I do wrong here?
> > The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
> > is best.)
> > The job script will still need to cd to $TMPDIR.
> Again, using TORQUE 2.0.0p5, I couldn't certify the expected behavior:
> i.e. that pbs_mom creates a transient temporary directory, stores a
> reference to it in the environment variable $TMPDIR, and then exports
> $TMPDIR to the prologue script for its usage on the compute node.
> This is what I tried:
> $ echo 'echo $TMPDIR' | qsub
> $ ls -la STDIN.*
> -rw------- 1 nxw18916 gsk_rd 0 Jan 11 2006
> -rw------- 1 nxw18916 gsk_rd 1 Jan 11 2006
> $ perl -e 'open F,"<STDIN.o45481";my $s=<F>;chomp $s;print
> qq(single byte is \\n\n) if $s eq ""'
> single byte is \n
> So the shell seems to have interpolated $TMPDIR to the empty string; thus
> the single-byte contents of STDIN.o45481 was the newline put out by echo.
> (Incidentally an 'echo -n' resulted in a zero-width file, but I didn't
> think of trying this for greater clarity until I had already copied the
> transcript. My apologies.)
> I believe that I understand how the end result is supposed to work: in the
> prologue script, I 'cd' into the pbs_mom-created temporary directory
> referenced by $TMPDIR to do my work and then pbs_mom will remove this
> directory after the job completes. So why does $TMPDIR always evaluate to
> a zero-width string? What's going on with this?
Did you config MOM with $tmpdir?
> Thanks in advance for any assistance.
> On Fri, Jan 06, 2006 at 11:45:58AM -0500, nathaniel.x.woody at gsk.com
> > I have a number of batch processes that I'm running with torque that all
> > run the same exact process on different pieces of a large data file. The
> > process creates a number of intermediate files and in the end produces a
> > file to be staged-out. My problem is that as soon as more than one job
> > executing on a node, these files have the chance to stomp all over each
> > other (ie Job 1 and job 2 are running on a node, Job 1 completes and
> > out.txt is staged out and then deleted (which confuses job 2) because
> > all run in the same directory (the user's home directory).
> If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
> use all job variables in the stageout, like
> "stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
> > What I would like to do is to convince torque to run the job in a clean
> > directory (for instance, ~/00001.somehose.com), so that I can keep the
> > jobs seperate without having to jump through file-renaming hoops or
> > the job start creating directories, etc. Torque essentially does this
> > the standard out and standard error files (by naming them by job id),
> > I can't seem to figure out how to get the desired behavior. Looking
> > through the archives, I found a reference to something similar to this
> > related to a patch that caused mom to create a temporary directory.
> > However, this was a patch for torque 1.0.1 or so, and it doesn't appear
> > have been incorporated at any point.
> The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
> is best.)
> The job script will still need to cd to $TMPDIR.
> > I've also noticed the rootdir and initdir parameters that I can set, but
> > don't think those create a directory if one doesn't already exist.
> Correct. qsub's -d is handy, but the directory must already exist.
> Some of my users create unique jobnames and do something like this:
> mkdir $jobname
> qsub -N $jobname -d $jobname
> > Is there a facility for doing what I describe here, or am I going to
> > do all of the work in the job script?
> It sounds like TORQUE does have some options for you. Let us know if
> you need something else.
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> [attachment "attm1nmb.dat" deleted by Nathaniel X Woody/PharmRD/GSK]
> torqueusers mailing list
> torqueusers at supercluster.org
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060111/7b28be11/attachment.bin
More information about the torqueusers