[torqueusers] torque/blcr integration

Al Taufer ataufer at adaptivecomputing.com
Tue Sep 21 12:26:59 MDT 2010


I do not know how up to date the scripts are on the web page but there are 2 scripts included with the distribution, they are in the torque/contrib/blcr directory and should be up to date. They are checkpoint_script and restart script, I would try using these.

Al Taufer
Adaptive Computing

----- Original Message -----
> Hi,
> 
> I'm following the instructions on
> http://www.clusterresources.com/products/torque/docs/2.6jobcheckpoint.shtml
> Torque is compiled with --enable-blcr, version 2.4.10, I'm aware that
> the doc is for 2.5.x, I did not easily find the doc for 2.4.x.
> 
> Attached are my
> mom_priv/{config,epilogue,blcr_checkpoint_script,blcr_restart_script}.
> It's essentially the scripts from the doc, but the script on the doc
> needs correction (or it would not run).
> blcr_checkpoint_script was editted to declare variable $depth and put
> a missing comma -- the aim was to fix the syntax (I didn't spend much
> time on the scripts).
> [ It would be nice to see the webpage has the code fixed. ]
> 
> I submitted my test job with "qsub -c enabled test.job", then issue
> qhold jobid. It did not checkpoint the job, under qstat -f, there's an
> output line for that job:
> comment = "Usage:
> /usr/local/torque/current/var/spool/torque/mom_priv/blcr_checkpoint_script"
> 
> Mom logs say that it the blcr_checkpoint_script exited with code 255,
> which is consistent with running the script without parameters.
> 
> I take that the pbs_mom did not issue the blcr_checkpoint_script
> command with all the required parameters.
> 
> Any comments, helpful hints, or outright help will be most welcome.
> 
> Thanks,
> Robin
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list