[torquedev] fork_to_user

Chris Samuel csamuel at vpac.org
Tue Apr 15 18:06:33 MDT 2008


----- "David Singleton" <David.Singleton at anu.edu.au> wrote:

Hello David!

> In trying to cleanup fork_to_user() in our version of OpenPBS, I
> noticed some dodginess in the use of the error return codes.  In
> checking what Torque does, it appears that it is dodgy in a different
> (but essentially the same) way.  The issue is possibly not serious
> but can lead to MOM exiting or a MOM child persisting and "playing
> MOM".

Ahh, that could explain something we saw this morning..

> Below is a summary of the return codes from fork_to_user() to
> req_cpyfile(). I haven't checked the other use of fork_to_user().
> 
> return ##  can lead to MOM exiting if she has trouble with a user
>             home dir

We had a process unable to copy files back to a users home
directory and all of a sudden there is no pbs_mom on the node,
no crash logged, nowt.. :-(

> return ### can lead to a child "playing MOM" if a malloc or setuid
>             fails (unlikely, I know)

I've seen some people report oddities with that on the torqueusers
list recently, so again that sounds like it's happening in the wild.

> More than happy for someone to point out my misunderstanding - then
> I may not have to fix anything :-).

No, I reckon you're onto something there I'm afraid!

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torquedev mailing list