[torqueusers] $TMPDIR

nathaniel.x.woody at GSK.COM nathaniel.x.woody at GSK.COM
Wed Jan 11 14:05:25 MST 2006


Thanks for the reply, I went ahead and rebuilt on 2.0.0p5 (just to ensure 
I knew what version I was using), and we tried some of the things you 
suggested but struggled to get things to work.  Below is a transcript of 
what we tried with example qsub commands so that hopefully you might point 
out where our mistake is!

> If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
> use all job variables in the stageout, like
> "stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"

Using TORQUE 2.0.0p5, I tried the following submission command to see if I 
could certify this behavior:

        $ echo 'echo $PBS_JOBID > $PBS_JOBID.txt' | qsub -W 
stageout='$PBS_JOBID.txt at headnode:/home/todd/$PBS_JOBID.txt'

The job failed and the system delivered the following e-mail notification:

        PBS Job Id: 45480.headnode
        Job Name:   STDIN
        An error has occurred processing your job, see below.
        Post job file processing error; job 45480.headnode on host node1/0

        Unable to copy file $PBS_JOBID.txt to todd at headnode:/home/todd/
$PBS_JOBID.txt
        >>> error from copy
        $PBS_JOBID.txt: No such file or directory
        >>> end error output

Which certainly makes it look like TORQUE didn't interpolate either 
instance of the $PBS_JOBID environment variable in the stageout attribute 
value. What did I do wrong here?

> The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
> is best.)
> 
> The job script will still need to cd to $TMPDIR.

Again, using TORQUE 2.0.0p5, I couldn't certify the expected behavior: 
i.e. that pbs_mom creates a transient temporary directory, stores a 
reference to it in the environment variable $TMPDIR, and then exports 
$TMPDIR to the prologue script for its usage on the compute node.

This is what I tried:

        $ echo 'echo $TMPDIR' | qsub
        45481.headnode
        $ ls -la STDIN.*
        -rw-------    1 nxw18916 gsk_rd          0 Jan 11  2006 
STDIN.e45481
        -rw-------    1 nxw18916 gsk_rd          1 Jan 11  2006 
STDIN.o45481
        $ perl -e 'open F,"<STDIN.o45481";my $s=<F>;chomp $s;print 
qq(single byte is \\n\n) if $s eq ""'
        single byte is \n

So the shell seems to have interpolated $TMPDIR to the empty string; thus 
the single-byte contents of STDIN.o45481 was the newline put out by echo. 
(Incidentally an 'echo -n' resulted in a zero-width file, but I didn't 
think of trying this for greater clarity until I had already copied the 
transcript. My apologies.)

I believe that I understand how the end result is supposed to work: in the 
prologue script, I 'cd' into the pbs_mom-created temporary directory 
referenced by $TMPDIR to do my work and then pbs_mom will remove this 
directory after the job completes. So why does $TMPDIR always evaluate to 
a zero-width string? What's going on with this?

Thanks in advance for any assistance.

Best,
Nate





On Fri, Jan 06, 2006 at 11:45:58AM -0500, nathaniel.x.woody at gsk.com 
alleged:
> I have a number of batch processes that I'm running with torque that all 

> run the same exact process on different pieces of a large data file. The 

> process creates a number of intermediate files and in the end produces a 

> file to be staged-out.  My problem is that as soon as more than one job 
is 
> executing on a node, these files have the chance to stomp all over each 
> other (ie Job 1 and job 2 are running on a node, Job 1 completes and 
> out.txt is staged out and then deleted (which confuses job 2) because 
they 
> all run in the same directory (the user's home directory). 

If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can
use all job variables in the stageout, like
"stageout=$HOME/out.txt at headnode:$HOME/out-$PBS_JOBID.txt"

 
> What I would like to do is to convince torque to run the job in a clean 
> directory (for instance, ~/00001.somehose.com), so that I can keep the 
> jobs seperate without having to jump through file-renaming hoops or 
making 
> the job start creating directories, etc.  Torque essentially does this 
for 
> the standard out and standard error files (by naming them by job id), 
but 
> I can't seem to figure out how to get the desired behavior.  Looking 
> through the archives, I found a reference to something similar to this 
> related to a patch that caused mom to create a temporary directory. 
> However, this was a patch for torque 1.0.1 or so, and it doesn't appear 
to 
> have been incorporated at any point. 

The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot
is best.)

The job script will still need to cd to $TMPDIR.

 
> I've also noticed the rootdir and initdir parameters that I can set, but 
I 
> don't think those create a directory if one doesn't already exist.

Correct.  qsub's -d is handy, but the directory must already exist.

Some of my users create unique jobnames and do something like this:
  mkdir $jobname
  qsub -N $jobname -d $jobname

 
> Is there a facility for doing what I describe here, or am I going to 
have 
> do all of the work in the job script?

It sounds like TORQUE does have some options for you.  Let us know if
you need something else.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
[attachment "attm1nmb.dat" deleted by Nathaniel X Woody/PharmRD/GSK] 
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060111/679d5742/attachment.html


More information about the torqueusers mailing list