[torqueusers] Release of TORQUE 2.3.3

Joshua Bernstein jbernstein at penguincomputing.com
Fri Aug 15 15:42:39 MDT 2008



Josh Butikofer wrote:
> Good question.
> 
> There is a subtle difference. The command pbsdsh uses tm_spawn() to 
> launch the new process. The new pbs_track command uses a new TM function 
> called tm_adopt(). This function allows a MOM to start watching a 
> process started by something other than the MOM. Any process in the 
> system can be added to the MOM's session table and it will then send 
> signals to that session, track resource usage, etc. as if the job was 
> "born" from the pbs_mom in the normal tm_spawn() way.

Thats very slick. Though I imagine the pbs_mom is only able to track the 
resource usage AFTER tm_adopt is called. So, I assume a proper 
implementation would call tm_adopt() right after a fork()/exec(), though 
  since the fork()/spawn()/adopt() isn't atomic the usage isn't perfect, 
but its very likely close enough.

> This is useful for some MPI libraries....

How about this case. Perhaps there is a case were a process is running, 
and all of a sudden you want to include that processes usage in the 
usage. Consider an application where there are daemons left running on a 
compute nodes that preform some reason intensive task on behalf of the 
job. (This is common with some commercial life science applications.) 
Then when a job starts pbs_mom would fork the process, and through say a 
prologue, tm_adopt is called on the daemons PID. This way the daemons' 
CPU time is accurately accounted for in the job. This leads two a few 
questions:

1) Can an PID be adopted by more then one job at the same time?

2) Is there something like tm_orphan() to un-adopt a PID?

...course I could have just looked at the code ;-)

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torqueusers mailing list