[torqueusers] reported cpu time during running parallel jobs in
torque 2.1.3...
Garrick Staples
garrick at clusterresources.com
Wed Oct 18 13:39:17 MDT 2006
On Wed, Oct 18, 2006 at 12:26:18PM -0600, Garrick Staples alleged:
> On Wed, Oct 18, 2006 at 05:40:40PM +0100, David Golden alleged:
> > Well, perhaps in some sort of karmic revenge after on-list discussion of
> > cput time accounting while back, just tried upgrading to torque 2.1.3, and it
> > seems something strange is going on with _recent_ torque:
> >
> > The resources_used.cput number ultimately reported in
> > e.g. /var/spool/pbs/server_priv/accounting/ for
> > parallel jobs still seems accurate enough
> >
> > However, qstat -f is underreporting, even when job is in "C" state, maybe
> > as if it's only reporting the job's mother superior node's processes
> > cput - and I think the issue might also be mangling our maui stats...
>
> That's peculiar.
>
> Looking...
It seems that sister MOMs aren't sending regular updates of cput, it
only happens at the very end.
Plus there is some sort of a race condition preventing the final
resources update (that gets into the accounting record) from getting to
the stat output.
Still looking...
More information about the torqueusers
mailing list