[torqueusers] TMPDIR

Simon Robbins robbins at physik.uni-wuppertal.de
Thu Dec 8 03:30:53 MST 2005


Hello,

thanks for all the responses.

On 7. Dec 2005, at 10:05 PM, Garrick Staples wrote:

> On Wed, Dec 07, 2005 at 03:53:59PM +0100, Simon Robbins alleged:
>>
>> Hi Garrick,
>>
>> I don't like the environment variable name "TMPDIR"
>> used in torque 2.  I attach the patches that I think
>> are needed to change the name of this to PBS_TMPDIR,
>> this seems to work on AMD Opterons.
>>
>> Until torque2 I was using an old patch and changing
>> the name of this variable to PBS_TMPDIR.
>> Unfortunately I have just had a user who was until
>> now been innocently using $TMPDIR; torque deleted
>> the contents of this directory when the job finished :-(
>>
>> What do people think about re-naming this variable?
>
> So the user is supplying their own TMPDIR variable to the job, set  
> to a
> directory that doesn't exist, TORQUE created the directory, and the  
> user
> was surprised when TORQUE deleted it after the job?

The variable "$tmpdir" is defined in the mom config:
$tmpdir /data
where /data is a disk local to the node, this works great :-)

The user was using a set of scripts that used the environment  
variable $TMPDIR and the qsub -v option:
qsub ... -v "TMPDIR=/some/directory" ...

Torque then interpreted this as a cue to use "$TMPDIR" as the  
transient tmpdir for the job and deleted the contents when the job  
finished (in the scripts: $TMPDIR was the ROOT directory where the  
temporary directory should be created (e.g. /tmp)).

Anyway, the point is the user was not aware of this new behaviour and  
was surprised when their scripts suddenly stopped working :-)

My suggestion is to rename the variable responsible for this  
behaviour to "PBS_TMPDIR".  I use the qsub wrapper script to define  
$TMPDIR to be the same as PBS_TMPDIR inside all jobs.  Only if the  
user defines "qsub -v "PBS_TMPDIR=/some/dir" will Torque use that as  
the transient tmpdir and will ignore TMPDIR.  I don't want to change  
the behaviour just the name.

Simon.

P.S. I also see problems for removal of links e.g.:
qsub -I
 > cd $TMPDIR
 > mkdir a_directory
 > ln -s a_directory link_to_dir
exit

when torque removes this recursively it removes (alphabetically)  
"a_directory" first and then can't remove the link "link_to_dir"  
because it isn't valid (mom_log):

12/08/2005 10:43:06;0080;   pbs_mom;Job; 
9428.whiterabbit.alicenext;Removing transient job directory /data/ 
9428.whiterabbit.alicenext
12/08/2005 10:43:06;0001;   pbs_mom;Svr;pbs_mom;No such file or  
directory (2) in remtree, stat
12/08/2005 10:43:06;0001;   pbs_mom;Svr;pbs_mom;Directory not empty  
(39) in remtree, rmdir failed on /data/9428.whiterabbit.alicenext
12/08/2005 10:43:06;0001;   pbs_mom;Svr;pbs_mom;Inappropriate ioctl  
for device (25) in recursive (r)rmdir, recursive remove of job  
transient tmpdir /data/9428.whiterabbit.alicenext failed

This didn't happen with the old transient-tmpdir patch :-(


More information about the torqueusers mailing list