[torqueusers] No use time for parallel jobs

Troy Baer troy at osc.edu
Thu Feb 2 07:48:51 MST 2006


On Thu, 2006-02-02 at 15:11 +0200, Adrien Leygue wrote:
> I am using Torque-2.0.0p5 on a small cluster, with the default C
> scheduler. For scheduling, I use the fair-share option.
> 
> For parallel jobs it seems that torque does not measure time.
> For example, here is a sample output of qstat:
> 
> Job id              Name             User             Time Use S Queue
> ------------------- ---------------- ---------------- -------- - -----
> 243.glass           min_test         user1           89:15:41 R main
> 258.glass           ...cript_S_New_R user2            00:00:00 R para
> 300.glass           SB_0.1_Var       user3            18:36:30 R long
> 302.glass           SB_0.01_Var_test user3            18:34:55 R main
> 303.glass           ...cript_S_New_R user2           00:00:00 R para
> 
> All jobs that are requiring more than one processor have a "time use"
> of 0, even though some of them have been running for days!

This depends almost entirely on how your parallel jobs are started.  If
they're started through the TM API somehow (eg. using OSC's mpiexec or
LAM's TM interface), CPU time should be accounted correctly.  If they're
started using rsh or ssh, there's no parent-child relationship that the
pbs_mom daemons on the compute nodes can use to monitor how much
resources the parallel processes consume.

	--Troy
-- 
Troy Baer                       troy at osc.edu
Science & Technology Support    http://www.osc.edu/hpc/
Ohio Supercomputer Center       614-292-9701



More information about the torqueusers mailing list