[torqueusers] Wrong cput value

Brock Palen brockp at umich.edu
Wed Jul 23 08:09:07 MDT 2008


Its not a bug, it happens consistently.  Some codes make processes  
that are not children of the mom.  If its not pbs cant keep track.

I think there is a different problem with something else, that cause  
PBS to lose track.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jul 23, 2008, at 10:03 AM, Kevin Murphy wrote:
> Brock Palen wrote:
>> Where these jobs differnt code?
>> Some code (hfss comes to mind)
>> forks the real process and somehow torque looses track of it.  So  
>> cput will almost be zero.
>> Other options if your using parallel code the user is not using a  
>> tm enabled mpirun.
>>
> The jobs use identical code, which happens to be a Perl wrapper  
> around a command-line java program, invoked via system().  So  
> you're suggesting that Torque might under rare circumstances  
> (because of some bug?) fail to account for the CPU time of the  
> child processes such as the perl-forked shell and shell-forked java  
> process ....  Hmmm.   So in general if a job invokes anything (?)  
> which might fork, the cput value should be treated with suspicion.   
> Too bad.
>>
>> On Jul 22, 2008, at 2:39 PM, Kevin Murphy wrote:
>>> I recently ran tracejob to compare runtime versus data-size  
>>> statistics on 563 jobs, and three of them had impossibly low  
>>> resources_used.cput values.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>



More information about the torqueusers mailing list