[torqueusers] Checkpointing and restart with torque 2.4 with BLCR
Alexander Oltu
Alexander.Oltu at uni.no
Tue Mar 30 01:20:20 MDT 2010
Hi Rajiv,
> *Mar 30 10:42:12 gcluster checkpoint_script: Invoked:
> /var/spool/torque/mom_priv/blcr_checkpoint_script 24472
> 0.gcluster.grid guser02 guser02
> /var/spool/torque/checkpoint/0.gcluster.grid.CKckpt.0.gcluster.grid.1269925932
> 15 -
> Mar 30 10:42:12 gcluster checkpoint_script: Usage:
> /var/spool/torque/mom_priv/blcr_checkpoint_script
> Mar 30 10:42:12 gcluster pbs_mom: LOG_ERROR::blcr_checkpoint_job,
> checkpoint script returned value 255
> *
from logs looks like script invocation is wrong, are you using scripts
from examples?
http://www.clusterresources.com/torquedocs21/2.6jobcheckpoint.shtml
You can try to execute as root:
/var/spool/torque/mom_priv/blcr_checkpoint_script 24472 0.gcluster.grid guser02 guser02 \
/var/spool/torque/checkpoint/0.gcluster.grid.CKckpt.0.gcluster.grid.1269925932 15 -
> Pls help me to solve this issue.. Or can u share the exact document
> u ve followed to configure the check pointing facility
blcr we have from Cray. For Torque configuration I followed the link above and
Cray's XT System Management guide.
Best Regards,
Alex.
More information about the torqueusers
mailing list