[torqueusers] BLCR Checkpoint strange error

Bhavya Malhotra Addepalli bhavya28writeme at gmail.com
Tue Mar 15 13:26:30 MDT 2011


Hello All,
I am testing blcr with torque.  It works to an extent when i follow the
instructions.

qhold jobid creates a checkpoing file in
/var/spool/torque/checkpoint/joibid.CK/ckpt.182025.somerandomnumer

When i do a qrls $JOBID job goes to a W state as it failed to retrieve the
ckpt file. After checking the logs it appears
that the .somerandomnumer with which the checkpoint file is created is
differnt from the file mom is trying to copy.
I could not find info on this as to how the random number is selected and
why it changes from the time the file is created to the time when it is
restarted.

Any clues.

Bhavs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20110315/7a5d6e5a/attachment.html 


More information about the torqueusers mailing list