[torqueusers] torque/blcr integration

Al Taufer ataufer at adaptivecomputing.com
Tue Sep 21 14:15:09 MDT 2010


The 2 scripts from the contrib/blcr directory may need to be modified for your installation.  They both need to set the PATH correctly so they can find certain executables. You might verify that this is correct for your system.

Al
----- Original Message -----
> Thanks, I just tried it with the ones in contrib dir.
> I'm getting the same error, the return code matches as if not enough
> parameter.
> 
> ===
> comment = Checkpoint script failed with return value of 255
> ===
> 
> 
> Robin
> 
> On Sep 21, 2010, at 2:26 PM, Al Taufer wrote:
> 
> > I do not know how up to date the scripts are on the web page but
> > there are 2 scripts included with the distribution, they are in the
> > torque/contrib/blcr directory and should be up to date. They are
> > checkpoint_script and restart script, I would try using these.
> >
> > Al Taufer
> > Adaptive Computing
> >
> > ----- Original Message -----
> >> Hi,
> >>
> >> I'm following the instructions on
> >> http://www.clusterresources.com/products/torque/docs/2.6jobcheckpoint.shtml
> >> Torque is compiled with --enable-blcr, version 2.4.10, I'm aware
> >> that
> >> the doc is for 2.5.x, I did not easily find the doc for 2.4.x.
> >>
> >> Attached are my
> >> mom_priv/{config,epilogue,blcr_checkpoint_script,blcr_restart_script}.
> >> It's essentially the scripts from the doc, but the script on the
> >> doc
> >> needs correction (or it would not run).
> >> blcr_checkpoint_script was editted to declare variable $depth and
> >> put
> >> a missing comma -- the aim was to fix the syntax (I didn't spend
> >> much
> >> time on the scripts).
> >> [ It would be nice to see the webpage has the code fixed. ]
> >>
> >> I submitted my test job with "qsub -c enabled test.job", then issue
> >> qhold jobid. It did not checkpoint the job, under qstat -f, there's
> >> an
> >> output line for that job:
> >> comment = "Usage:
> >> /usr/local/torque/current/var/spool/torque/mom_priv/blcr_checkpoint_script"
> >>
> >> Mom logs say that it the blcr_checkpoint_script exited with code
> >> 255,
> >> which is consistent with running the script without parameters.
> >>
> >> I take that the pbs_mom did not issue the blcr_checkpoint_script
> >> command with all the required parameters.
> >>
> >> Any comments, helpful hints, or outright help will be most welcome.
> >>
> >> Thanks,
> >> Robin
> >>
> >> _______________________________________________
> >> torqueusers mailing list
> >> torqueusers at supercluster.org
> >> http://www.supercluster.org/mailman/listinfo/torqueusers
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list