[torqueusers] NO sessID reported

Garrick Staples garrick at usc.edu
Tue Jan 24 17:28:55 MST 2006


On Mon, Dec 19, 2005 at 02:12:57PM +0300, Walid alleged:
>  *Dear All,
> 
> Trying torque 2.0.0P0 or later versions, I am having problem  that no sessid
> is reported if I specify number of nodes on the qsub

I figured out what is going on here.  It turns out to be a pretty deep
problem affecting not only the sessid, but also the altid, outpath, and
errpath.  sessid just happens to be the most visible.

If the job launch happens within $jobstartblocktime, then all 4 of those
job attributes are correctly sent by MOM to server.  If the launch takes
longer (and I set mine to 0), then they are likely not set.

During the job launch, those attributes are flagged as "modified."  In
MOM, those 4 attributes are in a static list to be sent to pbs_server if
they are "modified" (clearing the modify flag in the process.)

Unfortunately, MOM also clears the modify flag when it saves the job
state to disk.  This tends to happen right after a job launch when
pbs_server sends a ModifyJob request to reset the "nodes" attribute.

So you get a race condition that tends to finish with a job save
clearing the modify flags before those 4 attribute are sent to the
server.

I'm checking in a fix that defines a new "send me to server" attribute
flag that cleans up this entire process.  It works reliably now.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060124/22c0b95d/attachment.bin


More information about the torqueusers mailing list