[torqueusers] epilogue and node access policy

Seth T Graham sether at fnal.gov
Tue Sep 10 10:32:09 MDT 2013


On Sep 10, 2013, at 11:18 AM, Sakhile Masoka <sakhile.harvey at gmail.com>
 wrote:

> I have this command on my epilogue
> 
> user_procs=`/bin/ps -e -o pid= -o user= | /bin/grep -e "$2" | \
> while read pid owner.....
> 
> which the issue is, if one user is running multiple jobs in one node, my epilogue will kill all of them. 

lsof coupled with a grep gets around this issue. It will only give you process ids that belong to a specific job.

> I need a way to link JOBID's ($1) to the processes on the node. But also even with that, processes can start other processes, etc... which will make tracking difficult… 

If you've really got a problem with processes spinning up that fast, I don't think the epilogue is where you want to fix it. I don't know if my opinion is a common one, but I think the epilogue should only make a single attempt to clean up.. the more magic you try to cram into the prologue/epilogue the more fringe cases you're going to create, causing jobs to crash (crashed jobs create a heap more tickets than a node being set offline for a while). Policing users should be something you do with cron or a monitoring service. Education can help too.



More information about the torqueusers mailing list