[torqueusers] 1.1.0p6 cpu time counter fails with very long jobs?
garrick at usc.edu
Tue Feb 1 02:07:45 MST 2005
On Tue, Feb 01, 2005 at 10:43:38AM +0200, Mikko Huhtala alleged:
> Garrick Staples writes:
> > Do you have anything in your mom logs at that time?
> We've had a bit of a rearrangement of admin responsibilities and I do
> not have direct access to the log right now. I'll try to find out.
I've been looking through the code that I imagine is most likely to be
responsible, but nothing is jumping out at me. Any log messages might help to
point me in the right direction.
> I also realized that the jobs were started on p5 before the cluster
> was updated to p6-snap.1105139538, so I guess it is possible that
> something might have happened to the cpu time counters at the time of
> the on-the-fly upgrade.
I wouldn't think so. But if it did, then that's a problem. Of course, now I'm
looking at 1.2.0-0b0-snap.1107038000 so maybe something changed in that code
(but I don't think so).
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050201/323231cd/attachment-0001.bin
More information about the torqueusers