[torqueusers] Re: Epilogue script

Glen Beane glen.beane+torque at gmail.com
Tue Aug 29 09:52:17 MDT 2006


http://svn.osc.edu/repos/pbstools/trunk/sbin/reaver would likely catch
a lot more than checking processes environments,  but it won't clean
up processes kicking around on a node as long as there are other
legitimate processes belonging to that user still running.


I think we need to give a little more thought about the clean up
method we want to include.



On 8/29/06, Glen Beane <glen.beane+torque at gmail.com> wrote:
> I do not believe there is a cross-platform way to search the env or a
> process.  I am about to download the darwin ps source to check how
> they do it for ps -e on OS X / darwin, since the usual linux method of
> searching /proc obviously will not work.
>
> My only concern is that processes most likely to be left haning around
> are ones that are spawed outside of TM are also the most likely to
> *not* include the PBS ID in their ENV.  So when something actually
> needs to be cleaned up, there is a good chance this method won't wory
> anyway.
>
>
>
>
> On 8/28/06, Garrick Staples <garrick at clusterresources.com> wrote:
> > Is there a cross-platform way to search the env of processes?  It seems
> > like this will have to be implemented seperately for each MOM arch.
> >
> > On Mon, Aug 28, 2006 at 05:07:11PM -0600, Dave Jackson alleged:
> > > Glen,
> > >
> > >   I believe there is the possibility of negative side affects but the
> > > likelihood of this is immensely small.  A user would need to
> > > inadvertently set a specific environment variable to a specific value to
> > > have an issue.  This does not happen in the real world and if it does,
> > > this feature is configurable and is off by default.
> > >
> > >   I also believe there are exceptional cases in which it would not work.
> > > But these are not the majority.  I think we have a capability which
> > > would easily and immediately benefit many sites.  While this capability
> > > does not cover 100% of cases, it definitely makes things better for
> > > most.  Weighing pros and cons, I think this feature is clearly worth it.
> > >
> > > Dave
> > >
> > > On Mon, 2006-08-28 at 18:49 -0400, Glen Beane wrote:
> > > > I think I agree with Garrick on this one.
> > > >
> > > > On 8/28/06, Garrick Staples <garrick at clusterresources.com> wrote:
> > > > > I'm really uncomfortable with pbs_mom killing off processes that aren't
> > > > > under its control.  Even though looking for a jobid env var seems like a
> > > > > reasonable assumption, I'm sure it will break someone somewhere.
> > > > >
> > > > > This sounds like a site-specific assumption that is easily, and sanely,
> > > > > handled in epilogue.
> > > > >
> > > > > Perhaps this just belongs in the Wiki.
> > > > >
> > > > >
> > > > > On Mon, Aug 28, 2006 at 11:43:15AM -0400, Andrew Keen alleged:
> > > > > > Dave,
> > > > > >
> > > > > > This feature would be very useful to us as we often have this problem
> > > > > > (although not as often since we've migrated to using OSU's mpiexec
> > > > > > instead of mpirun).
> > > > > >
> > > > > > -Andy
> > > > > >
> > > > > > torqueusers-request at supercluster.org wrote:
> > > > > > >
> > > > > > >   1. Re: Epilogue script (Dave Jackson)
> > > > > > >   2. Re: Epilogue script (Diego M. Vadell)
> > > > > > >
> > > > > > >
> > > > > > >----------------------------------------------------------------------
> > > > > > >
> > > > > > >Message: 1
> > > > > > >Date: Fri, 25 Aug 2006 13:13:49 -0600
> > > > > > >From: Dave Jackson <jacksond at clusterresources.com>
> > > > > > >Subject: Re: [torqueusers] Epilogue script
> > > > > > >To: "Diego M. Vadell" <dvadell at linuxclusters.com.ar>
> > > > > > >Cc: torquedev at supercluster.org, torqueusers at supercluster.org
> > > > > > >Message-ID: <1156533229.10669.77.camel at koa.icluster.org>
> > > > > > >Content-Type: text/plain
> > > > > > >
> > > > > > >Diego,
> > > > > > >
> > > > > > >  What would be the negatives of enabling this feature in a much more
> > > > > > >integrated manner?  ie, both mother superior and sister moms have a
> > > > > > >config option 'cleanup_procs = true' which if true will search the
> > > > > > >process tree for processors owned by user X with a matching job id in
> > > > > > >the environment.  pbs_mom could then terminate all of these processes
> > > > > > >directly.  This would make this feature much easier for most sites to
> > > > > > >activate.  No epilog/prolog creation, no compiling, simply set a
> > > > > > >parameter.  And as you mention, it would work in both dedicated and
> > > > > > >shared node operation.
> > > > > > >
> > > > > > >  Thoughts?
> > > > > > >
> > > > > > >Dave
> > > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > torqueusers mailing list
> > > > > > torqueusers at supercluster.org
> > > > > > http://www.supercluster.org/mailman/listinfo/torqueusers
> > > > > _______________________________________________
> > > > > torqueusers mailing list
> > > > > torqueusers at supercluster.org
> > > > > http://www.supercluster.org/mailman/listinfo/torqueusers
> > > > >
> > > > _______________________________________________
> > > > torqueusers mailing list
> > > > torqueusers at supercluster.org
> > > > http://www.supercluster.org/mailman/listinfo/torqueusers
> > >
> > > _______________________________________________
> > > torqueusers mailing list
> > > torqueusers at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
>


More information about the torqueusers mailing list