[torqueusers] reported cpu time during running parallel jobs in torque 2.1.3...

Garrick Staples garrick at clusterresources.com
Wed Oct 18 13:39:17 MDT 2006


On Wed, Oct 18, 2006 at 12:26:18PM -0600, Garrick Staples alleged:
> On Wed, Oct 18, 2006 at 05:40:40PM +0100, David Golden alleged:
> > Well, perhaps in some sort of karmic revenge after on-list discussion of 
> > cput time accounting while back, just tried upgrading to torque 2.1.3, and it 
> > seems something strange is going on with _recent_ torque:
> > 
> > The resources_used.cput number ultimately reported  in 
> > e.g. /var/spool/pbs/server_priv/accounting/ for 
> > parallel jobs still seems accurate enough
> > 
> > However, qstat -f is underreporting, even when job is in "C" state, maybe  
> > as if it's only reporting the job's mother superior node's processes 
> > cput - and I think the issue might also be mangling our maui stats...
> 
> That's peculiar.
> 
> Looking...

It seems that sister MOMs aren't sending regular updates of cput, it
only happens at the very end.

Plus there is some sort of a race condition preventing the final
resources update (that gets into the accounting record) from getting to
the stat output.

Still looking...



More information about the torqueusers mailing list