[Mauiusers] checkpointing node 'job name' is correct behavior?
garrick at usc.edu
Mon Jul 2 13:51:58 MDT 2007
On Mon, Jul 02, 2007 at 01:55:08PM +0900, Heiga ZEN (Byung Ha CHUN) alleged:
> Hi all,
> I'm using Maui with Torque.
> I checked my maui log file and found a strange(?) parts as
> 07/02 12:54:26 INFO: checkpointing node 'p4-6'
> 07/02 12:54:26 INFO: checkpointing node 'p4-7'
> 07/02 12:54:26 INFO: checkpointing node 'pd4-13'
> 07/02 12:54:26 INFO: checkpointing node '5958.jasmine'
> 07/02 12:54:26 INFO: checkpointing node '5959.jasmine'
> 07/02 12:54:26 INFO: checkpointing node '6044.jasmine'
This looks like an old bug in the pbs client libraries that was fixed
years ago. Maui would issue a pbs_statnode() call, the data read had a
particular timeout, and the data would still be on the wire for the next
call to pbs_statjob().
You didn't say the version, but I assume an old version of TORQUE.
Update your TORQUE and rebuild Maui after installing the updated TORQUE
(updating Maui is not required for this particular bug).
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20070702/4f683e25/attachment.bin
More information about the mauiusers