[torqueusers] Semaphores limit per job/user in torque?

Mark Moore mmoore at ucar.edu
Tue Sep 24 11:26:53 MDT 2013


On 09/24/2013 09:29 AM, Andrew Savchenko wrote:
> Hello,
>
> On Mon, 23 Sep 2013 10:27:08 -0600 Mark Moore wrote:
>> Hmmmm....the problem is that the semaphores continue to take up memory
>> long after the job is finished, dead, killed, etc. Then the next job
>> comes along, creates more semaphores, and continues to take up space.
>> Fast forward through a few more jobs and the IPC buffer space becomes
>> exhausted.
>
> Exactly.
>
>> We hit this with the Intel license compiler checkout of all things.
>> Go figure.
>>
>> This really has to be addressed at the system level. Expecting user code
>> to solve this really isn't practical: users have no idea that semaphores
>> are being left hanging around, and if a job crashes there is no expectation
>> of clean up that can occur.
>>
>> I finally wrote a short epilogue script (willing to share) to clean
>> things out after each job completes. We haven't had a problem, since.
>
> Putting script to epilogue instead of cron job is a good idea, I'll
> do this, but epilogue still doesn't solve all problems.
>
> Let's consider that user may run multiple jobs on the same node. IPC
> semaphores are not connected to any pid, thus it is not safe to
> remove user's semaphores in the epilogue if there are some other jobs
> of this user on this node running.

Our cluster is not configured for shared_node, so we don't have this
problem.

'ipcs -p' will return LSPID (last PID to send to this semaphore) and
LRPID (last PID to received from this semaphore). A simple check of
these against the active process table should get what's needed. Yes,
a race condition could exist if the system quickly re-assigns a PID
to a new job.

> And if user will continuously run
> jobs on this node we will never be able to remove stale semaphores
> with epilogue. How have you addressed this issue in your epilogue?
>
> That's why I asked about IPC namespace isolation: if it can be used
> by torque per job, the stale semaphores will be gone with isolated
> namespace after job is finished. LXC works this way and since torque
> is capable to use cpuset, I was hoping that it is capable to use
> namespaces for job isolation too. Looks like this feature is not here
> yet.

Interesting, I hadn't thought of that. My concern would be too much
overhead introduced by building containers for each job.

>
> On second thought in the epilogue script semaphores change time may be
> checked against user's jobs start time and all those semaphores may
> be removed, whose last changed time is before the first job start
> time. Again this will not fix entire problem (you may consider one
> long-term job and a lot of short semaphore leaking jobs on the same
> node by the same user), but will mitigate the issue.
>
> Best regards,
> Andrew Savchenko
>
>


-- 
Regards,

	Mark
	--0-
----------------------------------------------------------------------
Mark Moore
UCAR/NCAR/CGD					mmoore at ucar.edu
1850 Table Mesa Drive				(W) 303 497-1338
Boulder, CO 80305				(F) 303 497-1324


More information about the torqueusers mailing list