[torqueusers] Wrong cput value
Brock Palen
brockp at umich.edu
Wed Jul 23 08:09:07 MDT 2008
Its not a bug, it happens consistently. Some codes make processes
that are not children of the mom. If its not pbs cant keep track.
I think there is a different problem with something else, that cause
PBS to lose track.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
On Jul 23, 2008, at 10:03 AM, Kevin Murphy wrote:
> Brock Palen wrote:
>> Where these jobs differnt code?
>> Some code (hfss comes to mind)
>> forks the real process and somehow torque looses track of it. So
>> cput will almost be zero.
>> Other options if your using parallel code the user is not using a
>> tm enabled mpirun.
>>
> The jobs use identical code, which happens to be a Perl wrapper
> around a command-line java program, invoked via system(). So
> you're suggesting that Torque might under rare circumstances
> (because of some bug?) fail to account for the CPU time of the
> child processes such as the perl-forked shell and shell-forked java
> process .... Hmmm. So in general if a job invokes anything (?)
> which might fork, the cput value should be treated with suspicion.
> Too bad.
>>
>> On Jul 22, 2008, at 2:39 PM, Kevin Murphy wrote:
>>> I recently ran tracejob to compare runtime versus data-size
>>> statistics on 563 jobs, and three of them had impossibly low
>>> resources_used.cput values.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
More information about the torqueusers
mailing list