[torqueusers] epilogue and node access policy

Sakhile Masoka sakhile.harvey at gmail.com
Tue Sep 10 10:48:57 MDT 2013


Thanks Seth

This command works, i didn't see it
*/usr/sbin/lsof |/bin/grep ${JOBID}|/bin/awk '{print $2}'|/bin/sort -u
*
*

*


On Tue, Sep 10, 2013 at 6:32 PM, Seth T Graham <sether at fnal.gov> wrote:

>
> On Sep 10, 2013, at 11:18 AM, Sakhile Masoka <sakhile.harvey at gmail.com>
>  wrote:
>
> > I have this command on my epilogue
> >
> > user_procs=`/bin/ps -e -o pid= -o user= | /bin/grep -e "$2" | \
> > while read pid owner.....
> >
> > which the issue is, if one user is running multiple jobs in one node, my
> epilogue will kill all of them.
>
> lsof coupled with a grep gets around this issue. It will only give you
> process ids that belong to a specific job.
>
> > I need a way to link JOBID's ($1) to the processes on the node. But also
> even with that, processes can start other processes, etc... which will make
> tracking difficult…
>
> If you've really got a problem with processes spinning up that fast, I
> don't think the epilogue is where you want to fix it. I don't know if my
> opinion is a common one, but I think the epilogue should only make a single
> attempt to clean up.. the more magic you try to cram into the
> prologue/epilogue the more fringe cases you're going to create, causing
> jobs to crash (crashed jobs create a heap more tickets than a node being
> set offline for a while). Policing users should be something you do with
> cron or a monitoring service. Education can help too.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130910/b953a6f8/attachment-0001.html 


More information about the torqueusers mailing list