[torqueusers] reported cpu time during running parallel jobs in
garrick at clusterresources.com
Wed Oct 18 19:35:32 MDT 2006
On Wed, Oct 18, 2006 at 01:39:17PM -0600, Garrick Staples alleged:
> On Wed, Oct 18, 2006 at 12:26:18PM -0600, Garrick Staples alleged:
> > On Wed, Oct 18, 2006 at 05:40:40PM +0100, David Golden alleged:
> > > Well, perhaps in some sort of karmic revenge after on-list discussion of
> > > cput time accounting while back, just tried upgrading to torque 2.1.3, and it
> > > seems something strange is going on with _recent_ torque:
> > >
> > > The resources_used.cput number ultimately reported in
> > > e.g. /var/spool/pbs/server_priv/accounting/ for
> > > parallel jobs still seems accurate enough
> > >
> > > However, qstat -f is underreporting, even when job is in "C" state, maybe
> > > as if it's only reporting the job's mother superior node's processes
> > > cput - and I think the issue might also be mangling our maui stats...
> > That's peculiar.
> > Looking...
> It seems that sister MOMs aren't sending regular updates of cput, it
> only happens at the very end.
> Plus there is some sort of a race condition preventing the final
> resources update (that gets into the accounting record) from getting to
> the stat output.
> Still looking...
I think this fixes both problems. Initial tests are good, but I want to
bang at it some more.
--- src/resmom/mom_main.c (revision 1053)
+++ src/resmom/mom_main.c (working copy)
@@ -6799,14 +6799,14 @@
if (pjob->ji_qs.ji_substate != JOB_SUBSTATE_RUNNING)
- if ((pjob->ji_qs.ji_svrflags & JOB_SVFLG_HERE) == 0)
/* update information for my tasks */
+ if ((pjob->ji_qs.ji_svrflags & JOB_SVFLG_HERE) == 0)
/* has all job processes vanished undetected ? */
/* double check by sig0 to session pid for each task */
--- src/server/req_jobobit.c (revision 1053)
+++ src/server/req_jobobit.c (working copy)
@@ -1626,6 +1626,13 @@
patlist = (svrattrl *)GET_NEXT(preq->rq_ind.rq_jobobit.rq_attr);
+ /* Encode the final resources_used into the job (useful for keep_completed) */
+ ATR_DFLAG_MGWR | ATR_DFLAG_SvWR,
More information about the torqueusers