[torqueusers] Bizzare $PBS_O_WORKDIR bug
Chris Samuel
csamuel at vpac.org
Mon Sep 3 23:22:03 MDT 2007
OK - this had me completely stumped for a while trying to work out why
certain of my test jobs would fail completely.
In Torque 2.1.8 it appears that the $PBS_O_WORKDIR variable gets
corrupted if its basename is identical to the name of the node.
So, for instance, being in this directory:
/home/csamuel/NAMD/ApoA1/Tango/tango021
causes this to happen:
[csamuel at tango tango021]$ qsub -l nodes=tango021 -I
qsub: waiting for job 570.tango-m.vpac.org to start
qsub: job 570.tango-m.vpac.org ready
[csamuel at tango021 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango021UEUE=run_1_day;NODES=tango021
[csamuel at tango021 ~]$
qsub: job 570.tango-m.vpac.org completed
Whereas if I submit to another node it is OK:
[csamuel at tango tango021]$ qsub -l nodes=tango020 -I
qsub: waiting for job 571.tango-m.vpac.org to start
qsub: job 571.tango-m.vpac.org ready
[csamuel at tango020 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango021
[csamuel at tango020 ~]$
qsub: job 571.tango-m.vpac.org completed
Now if I cd into ../tango020 the situation reverses and tango021 is
now OK:
[csamuel at tango tango020]$ qsub -l nodes=tango021 -I
qsub: waiting for job 572.tango-m.vpac.org to start
qsub: job 572.tango-m.vpac.org ready
[csamuel at tango021 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango020
[csamuel at tango021 ~]$
qsub: job 572.tango-m.vpac.org completed
whereas tango020 now fails:
[csamuel at tango tango020]$ qsub -l nodes=tango020 -I
qsub: waiting for job 573.tango-m.vpac.org to start
qsub: job 573.tango-m.vpac.org ready
[csamuel at tango020 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango020UEUE=run_1_day;NODES=tango020
[csamuel at tango020 ~]$
qsub: job 573.tango-m.vpac.org completed
Bizzare eh!
Anyone got any clues please ?
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070904/6db2d9d5/attachment.bin
More information about the torqueusers
mailing list