[torqueusers] Release of TORQUE 2.3.3
Joshua Bernstein
jbernstein at penguincomputing.com
Fri Aug 15 15:42:39 MDT 2008
Josh Butikofer wrote:
> Good question.
>
> There is a subtle difference. The command pbsdsh uses tm_spawn() to
> launch the new process. The new pbs_track command uses a new TM function
> called tm_adopt(). This function allows a MOM to start watching a
> process started by something other than the MOM. Any process in the
> system can be added to the MOM's session table and it will then send
> signals to that session, track resource usage, etc. as if the job was
> "born" from the pbs_mom in the normal tm_spawn() way.
Thats very slick. Though I imagine the pbs_mom is only able to track the
resource usage AFTER tm_adopt is called. So, I assume a proper
implementation would call tm_adopt() right after a fork()/exec(), though
since the fork()/spawn()/adopt() isn't atomic the usage isn't perfect,
but its very likely close enough.
> This is useful for some MPI libraries....
How about this case. Perhaps there is a case were a process is running,
and all of a sudden you want to include that processes usage in the
usage. Consider an application where there are daemons left running on a
compute nodes that preform some reason intensive task on behalf of the
job. (This is common with some commercial life science applications.)
Then when a job starts pbs_mom would fork the process, and through say a
prologue, tm_adopt is called on the daemons PID. This way the daemons'
CPU time is accurately accounted for in the job. This leads two a few
questions:
1) Can an PID be adopted by more then one job at the same time?
2) Is there something like tm_orphan() to un-adopt a PID?
...course I could have just looked at the code ;-)
-Joshua Bernstein
Software Engineer
Penguin Computing
More information about the torqueusers
mailing list