[torqueusers] PBS_NODEFILE munged during file staging?
Garrick Staples
garrick at usc.edu
Thu Jan 5 18:05:37 MST 2006
On Thu, Jan 05, 2006 at 02:34:02PM -0800, Michael Gutteridge alleged:
> Hi
>
> Something is screwing up PBS_NODEFILE during file stage-in. I'm using
> 2.0.0p3, didn't see mention in p4's readme or the list, apologies if I'm
> duplicating...
>
> Anyway, to reproduce, submit a job with a file stage-in request, vis:
>
> $ qsub -l nodes=2 -W stagein=localfile at host:remotefile -I
>
> When the interactive session opens:
>
> node$ env |grep PBS_NODEFILE
> node$ ls /var/spool/torque/aux
> node$
>
> No environment variable and no file neither...
>
> Interactive/non-interactive doesn't seem to make a difference- I've got
> word expansion enabled, but I get the same behavior whether there's a
> variable in the stagein request or not.
>
> I've turned up logging to 7 on the MOM- it doesn't show any errors. The
> MOM does seem to enumerate the nodes in the sisterhood properly...
>
> If I remove the stagein request, MOM behaves properly, creating both
> file and environment variable.
I've not seen this before and I can't reproduce it over here. I know
that part of the code pretty well and I can't think of any connections
between PBS_NODEFILE and stagein.
MOM makes a nodefile if the neednodes resource is set on the job. Does
'qstat -f' show "Resource_List.neenodes" for the job? Be sure to run
qstat as a server manager (pbs_server doesn't let non-managers see
neednodes.)
A bug in wordexp would be the prime suspect here. Does it happen when
you rebuild without wordexp?
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060105/da8ed47f/attachment-0001.bin
More information about the torqueusers
mailing list