[torqueusers] Wrong cput value

Kevin Murphy murphy at genome.chop.edu
Wed Jul 23 08:03:46 MDT 2008


Brock Palen wrote:
> Where these jobs differnt code?
> Some code (hfss comes to mind)
> forks the real process and somehow torque looses track of it.  So cput 
> will almost be zero.
> Other options if your using parallel code the user is not using a tm 
> enabled mpirun.
>
The jobs use identical code, which happens to be a Perl wrapper around a 
command-line java program, invoked via system().  So you're suggesting 
that Torque might under rare circumstances (because of some bug?) fail 
to account for the CPU time of the child processes such as the 
perl-forked shell and shell-forked java process ....  Hmmm.   So in 
general if a job invokes anything (?) which might fork, the cput value 
should be treated with suspicion.  Too bad.
>
> On Jul 22, 2008, at 2:39 PM, Kevin Murphy wrote:
>> I recently ran tracejob to compare runtime versus data-size 
>> statistics on 563 jobs, and three of them had impossibly low 
>> resources_used.cput values.



More information about the torqueusers mailing list