[torquedev] Bizzare $PBS_O_WORKDIR bug

Chris Samuel csamuel at vpac.org
Mon Sep 3 23:22:03 MDT 2007


OK - this had me completely stumped for a while trying to work out why 
certain of my test jobs would fail completely.

In Torque 2.1.8 it appears that the $PBS_O_WORKDIR variable gets 
corrupted if its basename is identical to the name of the node.

So, for instance, being in this directory:

/home/csamuel/NAMD/ApoA1/Tango/tango021

causes this to happen:

[csamuel at tango tango021]$ qsub -l nodes=tango021 -I
qsub: waiting for job 570.tango-m.vpac.org to start
qsub: job 570.tango-m.vpac.org ready

[csamuel at tango021 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango021UEUE=run_1_day;NODES=tango021
[csamuel at tango021 ~]$
qsub: job 570.tango-m.vpac.org completed

Whereas if I submit to another node it is OK:

[csamuel at tango tango021]$ qsub -l nodes=tango020 -I
qsub: waiting for job 571.tango-m.vpac.org to start
qsub: job 571.tango-m.vpac.org ready

[csamuel at tango020 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango021
[csamuel at tango020 ~]$
qsub: job 571.tango-m.vpac.org completed

Now if I cd into ../tango020 the situation reverses and tango021 is 
now OK:

[csamuel at tango tango020]$ qsub -l nodes=tango021 -I
qsub: waiting for job 572.tango-m.vpac.org to start
qsub: job 572.tango-m.vpac.org ready

[csamuel at tango021 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango020
[csamuel at tango021 ~]$
qsub: job 572.tango-m.vpac.org completed

whereas tango020 now fails:

[csamuel at tango tango020]$ qsub -l nodes=tango020 -I
qsub: waiting for job 573.tango-m.vpac.org to start
qsub: job 573.tango-m.vpac.org ready

[csamuel at tango020 ~]$ echo $PBS_O_WORKDIR
/home/csamuel/NAMD/ApoA1/Tango/tango020UEUE=run_1_day;NODES=tango020
[csamuel at tango020 ~]$
qsub: job 573.tango-m.vpac.org completed

Bizzare eh! 

Anyone got any clues please ?

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20070904/6db2d9d5/attachment.bin


More information about the torquedev mailing list