[torqueusers] Job Checkpoint and Restart with torque 2.4.6 & BLCR
Rajiv Rajaian
rajiv.care at gmail.com
Fri Mar 26 05:59:35 MDT 2010
Hi all
I ve installed torque 2.4.6 and enabled the blcr with the following option
while installing
*./configure --disable-gui --with-server-home=/var/spool/PBS
--with-default-server=gcluster.grid --enable-unixsockets=no --enable-blcr
--disable-gcc-warnings
*Also my mom_priv/config looks like
/var/spool/PBS/mom_priv/config
$checkpoint_script /var/spool/PBS/mom_priv/blcr_checkpoint_script
$restart_script /var/spool/PBS/mom_priv/blcr_restart_script
$checkpoint_run_exe /usr/local/bin/cr_run
$pbsserver gcluster.grid
$loglevel 7
I ve created blcr_checkpoint_script & blcr_restart_script scripts too
While job submission Im getting the following error .. Please help me to
solve this error.. Is there any thing else to be configured for this??
[guser02 at gcluster ~]$ qsub -c enabled,periodic,shutdown,interval=1 test.sh
1.gcluster.grid
[guser02 at gcluster ~]$ qhold 1
[guser02 at gcluster ~]$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1.gcluster test.sh guser02 0 R workq
[guser02 at gcluster ~]$ qstat -f
Job Id: 1.gcluster.grid
Job_Name = test.sh
Job_Owner = guser02 at gcluster.grid
job_state = R
queue = workq
server = gcluster.grid
Checkpoint = enabled,periodic,shutdown,interval=1
ctime = Fri Mar 26 17:20:03 2010
Error_Path = gcluster.grid:/home/guser02/test.sh.e1
exec_host = gcluster.grid/0
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Fri Mar 26 17:20:05 2010
Output_Path = gcluster.grid:/home/guser02/test.sh.o1
Priority = 0
qtime = Fri Mar 26 17:20:03 2010
Rerunable = True
Resource_List.nodect = 1
Resource_List.nodes = 1
session_id = 19993
Variable_List = PBS_O_HOME=/home/guser02,PBS_O_LOGNAME=guser02,
PBS_O_PATH=/usr/local/firefox/:/opt/mpich-1.2.6/bin:/usr/local/jdk1.5
.0_03/bin/:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local
/tomcat-5.0.27/bin:/usr/local/ant-1.6.4/bin:/usr/local/globus-4.0.3/bi
n:/usr/local/globus-4.0.3/sbin:/bin:/usr/local/maui/bin:/usr/local/gw/
bin:/usr/local/rrdtool/bin:/opt/ganglia/bin:/usr/local/sbin:/usr/local
/bin:/usr/local/pdftk-1.41/pdftk:/home/guser02/bin,
PBS_O_MAIL=/var/spool/mail/guser02,PBS_O_SHELL=/bin/bash,
PBS_O_HOST=gcluster.grid,PBS_SERVER=gcluster.grid,
PBS_O_WORKDIR=/home/guser02,PBS_O_QUEUE=workq
comment = Scalar found where operator expected at
/var/spool/PBS/mom_priv/
blcr_checkpoint_script line 31,
near "$signalNum $depth"
(Missing operator before $depth?)
syntax e
rror at /var/spool/PBS/mom_priv/blcr_checkpoint_script line 31,
near "$signalNum $depth"
Global symbol "$depth" requires explicit pa
ckage name at /var/spool/PBS/mom_priv/blcr_checkpoint_script line
31.
Execution of /var/spool/PBS/mom_priv/blcr_checkpoint_script aborted
du
e to compilation errors.
etime = Fri Mar 26 17:20:03 2010
submit_args = -c enabled,periodic,shutdown,interval=1 test.sh
start_time = Fri Mar 26 17:20:03 2010
start_count = 1
fault_tolerant = False
Regards
Rajiv R
Project Associate,
CARE,MIT,
Anna university ,Chennai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100326/7274539a/attachment.html
More information about the torqueusers
mailing list