[torqueusers] Semaphores limit per job/user in torque?

Andrew Savchenko bircoph at gmail.com
Tue Sep 24 09:29:42 MDT 2013


On Mon, 23 Sep 2013 10:27:08 -0600 Mark Moore wrote:
> Hmmmm....the problem is that the semaphores continue to take up memory
> long after the job is finished, dead, killed, etc. Then the next job
> comes along, creates more semaphores, and continues to take up space.
> Fast forward through a few more jobs and the IPC buffer space becomes
> exhausted.


> We hit this with the Intel license compiler checkout of all things.
> Go figure.
> This really has to be addressed at the system level. Expecting user code
> to solve this really isn't practical: users have no idea that semaphores
> are being left hanging around, and if a job crashes there is no expectation
> of clean up that can occur.
> I finally wrote a short epilogue script (willing to share) to clean
> things out after each job completes. We haven't had a problem, since.

Putting script to epilogue instead of cron job is a good idea, I'll
do this, but epilogue still doesn't solve all problems.

Let's consider that user may run multiple jobs on the same node. IPC
semaphores are not connected to any pid, thus it is not safe to
remove user's semaphores in the epilogue if there are some other jobs
of this user on this node running. And if user will continuously run
jobs on this node we will never be able to remove stale semaphores
with epilogue. How have you addressed this issue in your epilogue?

That's why I asked about IPC namespace isolation: if it can be used
by torque per job, the stale semaphores will be gone with isolated
namespace after job is finished. LXC works this way and since torque
is capable to use cpuset, I was hoping that it is capable to use
namespaces for job isolation too. Looks like this feature is not here

On second thought in the epilogue script semaphores change time may be
checked against user's jobs start time and all those semaphores may
be removed, whose last changed time is before the first job start
time. Again this will not fix entire problem (you may consider one
long-term job and a lot of short semaphore leaking jobs on the same
node by the same user), but will mitigate the issue.

Best regards,
Andrew Savchenko
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20130924/cf0f4973/attachment.bin 

More information about the torqueusers mailing list