[torqueusers] PBS_NODEFILE munged during file staging?

Garrick Staples garrick at usc.edu
Thu Jan 5 18:05:37 MST 2006


On Thu, Jan 05, 2006 at 02:34:02PM -0800, Michael Gutteridge alleged:
> Hi
> 
> Something is screwing up PBS_NODEFILE during file stage-in. I'm using
> 2.0.0p3, didn't see mention in p4's readme or the list, apologies if I'm
> duplicating...
> 
> Anyway, to reproduce, submit a job with a file stage-in request, vis:
> 
> $ qsub -l nodes=2 -W stagein=localfile at host:remotefile -I
> 
> When the interactive session opens:
> 
> node$ env |grep PBS_NODEFILE
> node$ ls /var/spool/torque/aux
> node$
> 
> No environment variable and no file neither...
> 
> Interactive/non-interactive doesn't seem to make a difference- I've got
> word expansion enabled, but I get the same behavior whether there's a
> variable in the stagein request or not.
> 
> I've turned up logging to 7 on the MOM- it doesn't show any errors. The
> MOM does seem to enumerate the nodes in the sisterhood properly...
> 
> If I remove the stagein request, MOM behaves properly, creating both
> file and environment variable.

I've not seen this before and I can't reproduce it over here.  I know
that part of the code pretty well and I can't think of any connections
between PBS_NODEFILE and stagein.

MOM makes a nodefile if the neednodes resource is set on the job.  Does
'qstat -f' show "Resource_List.neenodes" for the job?  Be sure to run
qstat as a server manager (pbs_server doesn't let non-managers see
neednodes.)

A bug in wordexp would be the prime suspect here.  Does it happen when
you rebuild without wordexp?

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060105/da8ed47f/attachment-0001.bin


More information about the torqueusers mailing list