[torqueusers] PBS_NODEFILE munged during file staging?

Dave Jackson jacksond at clusterresources.com
Fri Jan 6 17:37:00 MST 2006


Michael,

  It looks this is where CRI picks up the line.  Please submit a report
to moab-support at clusterresources.com to get this automatically
registered into the trouble ticket system.  We'll get started on this
immediately.

Dave

On Fri, 2006-01-06 at 15:47 -0800, Michael Gutteridge wrote:
> It looks like, inded, wordexp isn't the issue.  I did update to the
> latest p5 snapshot, installed the new server and mom.  Same behaviour.
> 
> But I just happened to notice- Resource_list.neednodes /seems/ to
> disappear about the same time the job changes state from Q -> R.  So it
> could be the scheduler.  I'm using Moab 4.2.2p4.
> 
> I just tried a quick test- I shut down moab and ran the same job using
> qrun- this time Resource_list.neednodes is maintained.  So I'm going to
> point the finger at Moab after all.  Don't know why it'd do that, but
> I'll dig into that now...
> 
> Thanks for the help- sorry for the bogus report...
> 
> M
> 
> 
> On Fri, 2006-01-06 at 12:07 -0800, Garrick Staples wrote:
> > On Thu, Jan 05, 2006 at 05:05:37PM -0800, Garrick Staples alleged:
> > > I've not seen this before and I can't reproduce it over here.  I know
> > > that part of the code pretty well and I can't think of any connections
> > > between PBS_NODEFILE and stagein.
> > > 
> > > MOM makes a nodefile if the neednodes resource is set on the job.  Does
> > > 'qstat -f' show "Resource_List.neenodes" for the job?  Be sure to run
> > > qstat as a server manager (pbs_server doesn't let non-managers see
> > > neednodes.)
> > 
> > Which scheduler are you using?  I wonder if for some odd reason the
> > scheduler is clearing neednodes when stagein is used.
> > 
> > (Yes, I'm still struggling to find some sort of connection)
> > 
> >  
> > > A bug in wordexp would be the prime suspect here.  Does it happen when
> > > you rebuild without wordexp?
> > 
> > And of course, if it works without wordexp, I'd ask you to try again
> > with a p5 snapshot.  There hasn't been any changes to neednodes,
> > nodefile, or a job's env, but the wordexp code got a facelift to better
> > support globbing.
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list