[Mauiusers] checkpointing node 'job name' is correct behavior?
garrick at usc.edu
Mon Jul 2 22:50:41 MDT 2007
On Tue, Jul 03, 2007 at 11:56:46AM +0900, Heiga ZEN (Byung Ha CHUN) alleged:
> Garrick Staples wrote (2007/07/03 4:51):
> >>07/02 12:54:26 INFO: checkpointing node 'p4-6'
> >>07/02 12:54:26 INFO: checkpointing node 'p4-7'
> >>07/02 12:54:26 INFO: checkpointing node 'pd4-13'
> >>07/02 12:54:26 INFO: checkpointing node '5958.jasmine'
> >>07/02 12:54:26 INFO: checkpointing node '5959.jasmine'
> >>07/02 12:54:26 INFO: checkpointing node '6044.jasmine'
> >This looks like an old bug in the pbs client libraries that was fixed
> >years ago. Maui would issue a pbs_statnode() call, the data read had a
> >particular timeout, and the data would still be on the wire for the next
> >call to pbs_statjob().
> OK, I see.
> >You didn't say the version, but I assume an old version of TORQUE.
> >Update your TORQUE and rebuild Maui after installing the updated TORQUE
> >(updating Maui is not required for this particular bug).
> Hmm, I'm using TORQUE 2.1.7 (not so old, isn't it?).
> Anyway, I'll update TORQUE and check this phenomenon.
Hrm, 2.1.7 would certainly have that bug fixed. Was Maui rebuilt after
2.1.7 was installed?
The key here is that 2.0.x only had static libraries, so all other
programs had to be rebuilt each time TORQUE/openpbs was updated. TORQUE
2.1.x has shared libraries; which only require maui to be restarted.
Please avoid sending me Word or PowerPoint attachments.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20070702/0c913b61/attachment.bin
More information about the mauiusers