[Mauiusers] checkpointing node 'job name' is correct behavior?

Garrick Staples garrick at usc.edu
Mon Jul 2 22:50:41 MDT 2007


On Tue, Jul 03, 2007 at 11:56:46AM +0900, Heiga ZEN (Byung Ha CHUN) alleged:
> Hi,
> 
> Garrick Staples wrote (2007/07/03 4:51):
> 
> >>07/02 12:54:26 INFO:     checkpointing node 'p4-6'
> >>07/02 12:54:26 INFO:     checkpointing node 'p4-7'
> >>...
> >>07/02 12:54:26 INFO:     checkpointing node 'pd4-13'
> >>07/02 12:54:26 INFO:     checkpointing node '5958.jasmine'
> >>07/02 12:54:26 INFO:     checkpointing node '5959.jasmine'
> >>...
> >>07/02 12:54:26 INFO:     checkpointing node '6044.jasmine'
> >
> >This looks like an old bug in the pbs client libraries that was fixed
> >years ago.  Maui would issue a pbs_statnode() call, the data read had a
> >particular timeout, and the data would still be on the wire for the next
> >call to pbs_statjob().
> 
> OK, I see.
> 
> >You didn't say the version, but I assume an old version of TORQUE.
> >Update your TORQUE and rebuild Maui after installing the updated TORQUE
> >(updating Maui is not required for this particular bug).
> 
> Hmm, I'm using TORQUE 2.1.7 (not so old, isn't it?).
> Anyway, I'll update TORQUE and check this phenomenon.

Hrm, 2.1.7 would certainly have that bug fixed.  Was Maui rebuilt after
2.1.7 was installed?

The key here is that 2.0.x only had static libraries, so all other
programs had to be rebuilt each time TORQUE/openpbs was updated.  TORQUE
2.1.x has shared libraries; which only require maui to be restarted.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20070702/0c913b61/attachment.bin


More information about the mauiusers mailing list