csamuel at vpac.org
Tue Apr 15 18:06:33 MDT 2008
----- "David Singleton" <David.Singleton at anu.edu.au> wrote:
> In trying to cleanup fork_to_user() in our version of OpenPBS, I
> noticed some dodginess in the use of the error return codes. In
> checking what Torque does, it appears that it is dodgy in a different
> (but essentially the same) way. The issue is possibly not serious
> but can lead to MOM exiting or a MOM child persisting and "playing
Ahh, that could explain something we saw this morning..
> Below is a summary of the return codes from fork_to_user() to
> req_cpyfile(). I haven't checked the other use of fork_to_user().
> return ## can lead to MOM exiting if she has trouble with a user
> home dir
We had a process unable to copy files back to a users home
directory and all of a sudden there is no pbs_mom on the node,
no crash logged, nowt.. :-(
> return ### can lead to a child "playing MOM" if a malloc or setuid
> fails (unlikely, I know)
I've seen some people report oddities with that on the torqueusers
list recently, so again that sounds like it's happening in the wild.
> More than happy for someone to point out my misunderstanding - then
> I may not have to fix anything :-).
No, I reckon you're onto something there I'm afraid!
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the torquedev