[torqueusers] $TMPDIR

Garrick Staples garrick at usc.edu
Wed Jan 11 14:27:46 MST 2006


On Wed, Jan 11, 2006 at 04:05:25PM -0500, nathaniel.x.woody at GSK.COM alleged:
> Thanks for the reply, I went ahead and rebuilt on 2.0.0p5 (just to ensure 
> I knew what version I was using), and we tried some of the things you 
> suggested but struggled to get things to work.  Below is a transcript of 
> what we tried with example qsub commands so that hopefully you might point 
> out where our mistake is!
> 
> > If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
> > use all job variables in the stageout, like
> > "stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
> 
> Using TORQUE 2.0.0p5, I tried the following submission command to see if I 
> could certify this behavior:
> 
>         $ echo 'echo $PBS_JOBID > $PBS_JOBID.txt' | qsub -W 
> stageout='$PBS_JOBID.txt at headnode:/home/todd/$PBS_JOBID.txt'
> 
> The job failed and the system delivered the following e-mail notification:
> 
>         PBS Job Id: 45480.headnode
>         Job Name:   STDIN
>         An error has occurred processing your job, see below.
>         Post job file processing error; job 45480.headnode on host node1/0
> 
>         Unable to copy file $PBS_JOBID.txt to todd at headnode:/home/todd/
> $PBS_JOBID.txt
>         >>> error from copy
>         $PBS_JOBID.txt: No such file or directory
>         >>> end error output

Did you build torque with --enable-wordexp?

 
> Which certainly makes it look like TORQUE didn't interpolate either 
> instance of the $PBS_JOBID environment variable in the stageout attribute 
> value. What did I do wrong here?
> 
> > The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
> > is best.)
> > 
> > The job script will still need to cd to $TMPDIR.
> 
> Again, using TORQUE 2.0.0p5, I couldn't certify the expected behavior: 
> i.e. that pbs_mom creates a transient temporary directory, stores a 
> reference to it in the environment variable $TMPDIR, and then exports 
> $TMPDIR to the prologue script for its usage on the compute node.
> 
> This is what I tried:
> 
>         $ echo 'echo $TMPDIR' | qsub
>         45481.headnode
>         $ ls -la STDIN.*
>         -rw-------    1 nxw18916 gsk_rd          0 Jan 11  2006 
> STDIN.e45481
>         -rw-------    1 nxw18916 gsk_rd          1 Jan 11  2006 
> STDIN.o45481
>         $ perl -e 'open F,"<STDIN.o45481";my $s=<F>;chomp $s;print 
> qq(single byte is \\n\n) if $s eq ""'
>         single byte is \n
> 
> So the shell seems to have interpolated $TMPDIR to the empty string; thus 
> the single-byte contents of STDIN.o45481 was the newline put out by echo. 
> (Incidentally an 'echo -n' resulted in a zero-width file, but I didn't 
> think of trying this for greater clarity until I had already copied the 
> transcript. My apologies.)
> 
> I believe that I understand how the end result is supposed to work: in the 
> prologue script, I 'cd' into the pbs_mom-created temporary directory 
> referenced by $TMPDIR to do my work and then pbs_mom will remove this 
> directory after the job completes. So why does $TMPDIR always evaluate to 
> a zero-width string? What's going on with this?

Did you config MOM with $tmpdir?

 
> Thanks in advance for any assistance.
> 
> Best,
> Nate
> 
> 
> 
> 
> 
> On Fri, Jan 06, 2006 at 11:45:58AM -0500, nathaniel.x.woody at gsk.com 
> alleged:
> > I have a number of batch processes that I'm running with torque that all 
> 
> > run the same exact process on different pieces of a large data file. The 
> 
> > process creates a number of intermediate files and in the end produces a 
> 
> > file to be staged-out.  My problem is that as soon as more than one job 
> is 
> > executing on a node, these files have the chance to stomp all over each 
> > other (ie Job 1 and job 2 are running on a node, Job 1 completes and 
> > out.txt is staged out and then deleted (which confuses job 2) because 
> they 
> > all run in the same directory (the user's home directory). 
> 
> If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
> use all job variables in the stageout, like
> "stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"
> 
>  
> > What I would like to do is to convince torque to run the job in a clean 
> > directory (for instance, ~/00001.somehose.com), so that I can keep the 
> > jobs seperate without having to jump through file-renaming hoops or 
> making 
> > the job start creating directories, etc.  Torque essentially does this 
> for 
> > the standard out and standard error files (by naming them by job id), 
> but 
> > I can't seem to figure out how to get the desired behavior.  Looking 
> > through the archives, I found a reference to something similar to this 
> > related to a patch that caused mom to create a temporary directory. 
> > However, this was a patch for torque 1.0.1 or so, and it doesn't appear 
> to 
> > have been incorporated at any point. 
> 
> The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
> is best.)
> 
> The job script will still need to cd to $TMPDIR.
> 
>  
> > I've also noticed the rootdir and initdir parameters that I can set, but 
> I 
> > don't think those create a directory if one doesn't already exist.
> 
> Correct.  qsub's -d is handy, but the directory must already exist.
> 
> Some of my users create unique jobnames and do something like this:
>   mkdir $jobname
>   qsub -N $jobname -d $jobname
> 
>  
> > Is there a facility for doing what I describe here, or am I going to 
> have 
> > do all of the work in the job script?
> 
> It sounds like TORQUE does have some options for you.  Let us know if
> you need something else.
> 
> -- 
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> [attachment "attm1nmb.dat" deleted by Nathaniel X Woody/PharmRD/GSK] 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060111/7b28be11/attachment.bin


More information about the torqueusers mailing list