[torqueusers] cput statistic not correct for some jobs

Martin Schafföner martin.schaffoener at e-technik.uni-magdeburg.de
Wed Jan 24 10:38:50 MST 2007


I've noticed some weird behavior / possible bug in torque regarding the cput 
statistic of some jobs. If we have a job that has a number of tasks spawned 
through mpiexec/TM which keep running for a "long" time, the statistics 
reported by qstat seem to be correct. However, some jobs launch many tasks 
consecutively (also through mpiexec). It probably is not the envisaged 
usage pattern to have one job with, say, 8 processors launch hundreds or 
thousands of tasks, which may only be running for a few seconds or minutes, 
through TM, but hey :-)

I logged through the concerned moms and noticed that e.g. cput_sum() is 
called regularly, apparently through a poll interval, and when a task is 
finished (from scan_for_terminated()), and it does collect significant cpu 
time. However, qstat only reports minimal cput usage even though the job's 
nodes have constantly been busy.

Can anybody explain this behavior?

Martin Schafföner

Cognitive Systems Group, Institute of Electronics, Signal Processing and 
Communication Technologies, Department of Electrical Engineering, 
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070124/656ede04/attachment.bin

More information about the torqueusers mailing list