[torqueusers] Job Checkpoint and Restart with torque 2.4.6 & BLCR

Rajiv Rajaian rajiv.care at gmail.com
Fri Mar 26 05:59:35 MDT 2010


Hi all
I ve installed torque 2.4.6 and enabled the blcr with the following option
while installing

*./configure --disable-gui --with-server-home=/var/spool/PBS
--with-default-server=gcluster.grid --enable-unixsockets=no --enable-blcr
--disable-gcc-warnings

*Also my mom_priv/config looks like

/var/spool/PBS/mom_priv/config
$checkpoint_script  /var/spool/PBS/mom_priv/blcr_checkpoint_script
$restart_script  /var/spool/PBS/mom_priv/blcr_restart_script
$checkpoint_run_exe /usr/local/bin/cr_run
$pbsserver gcluster.grid
$loglevel 7


I ve created blcr_checkpoint_script & blcr_restart_script scripts too

While job submission Im getting the following error .. Please help me to
solve this error.. Is there any thing else to be configured for this??

[guser02 at gcluster ~]$ qsub -c enabled,periodic,shutdown,interval=1 test.sh
1.gcluster.grid

[guser02 at gcluster ~]$ qhold 1

[guser02 at gcluster ~]$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1.gcluster                test.sh          guser02                0 R workq

[guser02 at gcluster ~]$ qstat -f
Job Id: 1.gcluster.grid
    Job_Name = test.sh
    Job_Owner = guser02 at gcluster.grid
    job_state = R
    queue = workq
    server = gcluster.grid
    Checkpoint = enabled,periodic,shutdown,interval=1
    ctime = Fri Mar 26 17:20:03 2010
    Error_Path = gcluster.grid:/home/guser02/test.sh.e1
    exec_host = gcluster.grid/0
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Fri Mar 26 17:20:05 2010
    Output_Path = gcluster.grid:/home/guser02/test.sh.o1
    Priority = 0
    qtime = Fri Mar 26 17:20:03 2010
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = 1
    session_id = 19993
    Variable_List = PBS_O_HOME=/home/guser02,PBS_O_LOGNAME=guser02,

PBS_O_PATH=/usr/local/firefox/:/opt/mpich-1.2.6/bin:/usr/local/jdk1.5

.0_03/bin/:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local

/tomcat-5.0.27/bin:/usr/local/ant-1.6.4/bin:/usr/local/globus-4.0.3/bi

n:/usr/local/globus-4.0.3/sbin:/bin:/usr/local/maui/bin:/usr/local/gw/

bin:/usr/local/rrdtool/bin:/opt/ganglia/bin:/usr/local/sbin:/usr/local
        /bin:/usr/local/pdftk-1.41/pdftk:/home/guser02/bin,
        PBS_O_MAIL=/var/spool/mail/guser02,PBS_O_SHELL=/bin/bash,
        PBS_O_HOST=gcluster.grid,PBS_SERVER=gcluster.grid,
        PBS_O_WORKDIR=/home/guser02,PBS_O_QUEUE=workq
    comment = Scalar found where operator expected at
/var/spool/PBS/mom_priv/
        blcr_checkpoint_script line 31,
         near "$signalNum $depth"
        (Missing operator before $depth?)
syntax e
        rror at /var/spool/PBS/mom_priv/blcr_checkpoint_script line 31,
         near "$signalNum $depth"
Global symbol "$depth" requires explicit pa
        ckage name at /var/spool/PBS/mom_priv/blcr_checkpoint_script line
31.

        Execution of /var/spool/PBS/mom_priv/blcr_checkpoint_script aborted
du
        e to compilation errors.

    etime = Fri Mar 26 17:20:03 2010
    submit_args = -c enabled,periodic,shutdown,interval=1 test.sh
    start_time = Fri Mar 26 17:20:03 2010
    start_count = 1
    fault_tolerant = False


Regards
Rajiv R
Project Associate,
CARE,MIT,
Anna university ,Chennai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100326/7274539a/attachment.html 


More information about the torqueusers mailing list