[torqueusers] NO sessID reported
jacksond at clusterresources.com
Tue Jan 24 17:37:37 MST 2006
Would you be able to make the change to create the pbs host file based
on exechost rather than neednodes? I wanted to get this into p6 but am
running out of time. I've got some tight deadlines on multi-cluster co-
allocation and 'learning' grid job start estimation and this is burning
all of my time through when we hoped to release. If its a pain, we can
wait and I can work on it later.
On Tue, 2006-01-24 at 16:28 -0800, Garrick Staples wrote:
> On Mon, Dec 19, 2005 at 02:12:57PM +0300, Walid alleged:
> > *Dear All,
> > Trying torque 2.0.0P0 or later versions, I am having problem that no sessid
> > is reported if I specify number of nodes on the qsub
> I figured out what is going on here. It turns out to be a pretty deep
> problem affecting not only the sessid, but also the altid, outpath, and
> errpath. sessid just happens to be the most visible.
> If the job launch happens within $jobstartblocktime, then all 4 of those
> job attributes are correctly sent by MOM to server. If the launch takes
> longer (and I set mine to 0), then they are likely not set.
> During the job launch, those attributes are flagged as "modified." In
> MOM, those 4 attributes are in a static list to be sent to pbs_server if
> they are "modified" (clearing the modify flag in the process.)
> Unfortunately, MOM also clears the modify flag when it saves the job
> state to disk. This tends to happen right after a job launch when
> pbs_server sends a ModifyJob request to reset the "nodes" attribute.
> So you get a race condition that tends to finish with a job save
> clearing the modify flags before those 4 attribute are sent to the
> I'm checking in a fix that defines a new "send me to server" attribute
> flag that cleans up this entire process. It works reliably now.
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers