[torqueusers] 1.1.0p6 cpu time counter fails with very long jobs?

Garrick Staples garrick at usc.edu
Tue Feb 1 02:07:45 MST 2005


On Tue, Feb 01, 2005 at 10:43:38AM +0200, Mikko Huhtala alleged:
> Garrick Staples writes:
>  > Do you have anything in your mom logs at that time?
> 
> We've had a bit of a rearrangement of admin responsibilities and I do
> not have direct access to the log right now. I'll try to find out.

I've been looking through the code that I imagine is most likely to be
responsible, but nothing is jumping out at me.  Any log messages might help to
point me in the right direction.

 
> I also realized that the jobs were started on p5 before the cluster
> was updated to p6-snap.1105139538, so I guess it is possible that
> something might have happened to the cpu time counters at the time of
> the on-the-fly upgrade.

I wouldn't think so.  But if it did, then that's a problem.  Of course, now I'm
looking at 1.2.0-0b0-snap.1107038000 so maybe something changed in that code
(but I don't think so).

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050201/323231cd/attachment-0001.bin


More information about the torqueusers mailing list