[torqueusers] Checkpointing and restart with torque 2.4 with BLCR

Alexander Oltu Alexander.Oltu at uni.no
Tue Mar 30 01:20:20 MDT 2010


Hi Rajiv,

> *Mar 30 10:42:12 gcluster checkpoint_script: Invoked:
> /var/spool/torque/mom_priv/blcr_checkpoint_script 24472
> 0.gcluster.grid guser02 guser02
> /var/spool/torque/checkpoint/0.gcluster.grid.CKckpt.0.gcluster.grid.1269925932
> 15 -
> Mar 30 10:42:12 gcluster checkpoint_script: Usage:
> /var/spool/torque/mom_priv/blcr_checkpoint_script
> Mar 30 10:42:12 gcluster pbs_mom: LOG_ERROR::blcr_checkpoint_job,
> checkpoint script returned value 255
> *

from logs looks like script invocation is wrong, are you using scripts
from examples?
http://www.clusterresources.com/torquedocs21/2.6jobcheckpoint.shtml

You can try to execute as root:
/var/spool/torque/mom_priv/blcr_checkpoint_script 24472 0.gcluster.grid guser02 guser02 \
 /var/spool/torque/checkpoint/0.gcluster.grid.CKckpt.0.gcluster.grid.1269925932 15 - 

> Pls help me to solve this issue.. Or can u share the  exact document
> u ve followed to configure  the check pointing facility

blcr we have from Cray. For Torque configuration I followed the link above and 
Cray's  XT System Management guide.

Best Regards,
Alex.


More information about the torqueusers mailing list