[Mauiusers] checkpointing node 'job name' is correct behavior?

Garrick Staples garrick at usc.edu
Mon Jul 2 13:51:58 MDT 2007


On Mon, Jul 02, 2007 at 01:55:08PM +0900, Heiga ZEN (Byung Ha CHUN) alleged:
> Hi all,
> 
> I'm using Maui with Torque.
> I checked my maui log file and found a strange(?) parts as
> 
> 
> 07/02 12:54:26 INFO:     checkpointing node 'p4-6'
> 07/02 12:54:26 INFO:     checkpointing node 'p4-7'
> ...
> 07/02 12:54:26 INFO:     checkpointing node 'pd4-13'
> 07/02 12:54:26 INFO:     checkpointing node '5958.jasmine'
> 07/02 12:54:26 INFO:     checkpointing node '5959.jasmine'
> ...
> 07/02 12:54:26 INFO:     checkpointing node '6044.jasmine'

This looks like an old bug in the pbs client libraries that was fixed
years ago.  Maui would issue a pbs_statnode() call, the data read had a
particular timeout, and the data would still be on the wire for the next
call to pbs_statjob().

You didn't say the version, but I assume an old version of TORQUE.
Update your TORQUE and rebuild Maui after installing the updated TORQUE
(updating Maui is not required for this particular bug).


-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20070702/4f683e25/attachment.bin


More information about the mauiusers mailing list