[torquedev] Re: [torqueusers] Release of TORQUE 2.3.3
Josh Butikofer
josh at clusterresources.com
Thu Aug 21 11:10:29 MDT 2008
Joshua Bernstein wrote:
>
>
> Josh Butikofer wrote:
>> Good question.
>>
>> There is a subtle difference. The command pbsdsh uses tm_spawn() to
>> launch the new process. The new pbs_track command uses a new TM
>> function called tm_adopt(). This function allows a MOM to start
>> watching a process started by something other than the MOM. Any
>> process in the system can be added to the MOM's session table and it
>> will then send signals to that session, track resource usage, etc. as
>> if the job was "born" from the pbs_mom in the normal tm_spawn() way.
>
> Thats very slick. Though I imagine the pbs_mom is only able to track the
> resource usage AFTER tm_adopt is called. So, I assume a proper
> implementation would call tm_adopt() right after a fork()/exec(), though
> since the fork()/spawn()/adopt() isn't atomic the usage isn't perfect,
> but its very likely close enough.
Yeah, of course tm_adopt() must be called after the process you want to track is running. You would
usually do a fork() then call tm_adopt() then exec(). The new command pbs_track by default, however,
does a tm_adopt() and then an exec(). This is because with SGI MPT, the array services must start
the process that eventually becomes the job.
>> This is useful for some MPI libraries....
>
> How about this case. Perhaps there is a case were a process is running,
> and all of a sudden you want to include that processes usage in the
> usage. Consider an application where there are daemons left running on a
> compute nodes that preform some reason intensive task on behalf of the
> job. (This is common with some commercial life science applications.)
> Then when a job starts pbs_mom would fork the process, and through say a
> prologue, tm_adopt is called on the daemons PID. This way the daemons'
> CPU time is accurately accounted for in the job. This leads two a few
> questions:
>
> 1) Can an PID be adopted by more then one job at the same time?
Yeah ... this is theoretically possible. I haven't tested it though. :)
> 2) Is there something like tm_orphan() to un-adopt a PID?
No, there is not.
More information about the torquedev
mailing list