[torqueusers] NO sessID reported

Dave Jackson jacksond at clusterresources.com
Tue Jan 24 17:37:37 MST 2006


Garrick,

  Would you be able to make the change to create the pbs host file based
on exechost rather than neednodes?  I wanted to get this into p6 but am
running out of time.  I've got some tight deadlines on multi-cluster co-
allocation and 'learning' grid job start estimation and this is burning
all of my time through when we hoped to release.  If its a pain, we can
wait and I can work on it later.

Thanks,
Dave 

On Tue, 2006-01-24 at 16:28 -0800, Garrick Staples wrote:
> On Mon, Dec 19, 2005 at 02:12:57PM +0300, Walid alleged:
> >  *Dear All,
> > 
> > Trying torque 2.0.0P0 or later versions, I am having problem  that no sessid
> > is reported if I specify number of nodes on the qsub
> 
> I figured out what is going on here.  It turns out to be a pretty deep
> problem affecting not only the sessid, but also the altid, outpath, and
> errpath.  sessid just happens to be the most visible.
> 
> If the job launch happens within $jobstartblocktime, then all 4 of those
> job attributes are correctly sent by MOM to server.  If the launch takes
> longer (and I set mine to 0), then they are likely not set.
> 
> During the job launch, those attributes are flagged as "modified."  In
> MOM, those 4 attributes are in a static list to be sent to pbs_server if
> they are "modified" (clearing the modify flag in the process.)
> 
> Unfortunately, MOM also clears the modify flag when it saves the job
> state to disk.  This tends to happen right after a job launch when
> pbs_server sends a ModifyJob request to reset the "nodes" attribute.
> 
> So you get a race condition that tends to finish with a job save
> clearing the modify flags before those 4 attribute are sent to the
> server.
> 
> I'm checking in a fix that defines a new "send me to server" attribute
> flag that cleans up this entire process.  It works reliably now.
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list