[torqueusers] Enabling BLCR on Torque roll (problems with "qhold")

Ricardo Alves rdq.alves at gmail.com
Wed Mar 16 10:08:32 MDT 2011


Thank you for the help.
I was able to solve my problem, with the pbs_mom error and replaced the blcr scripts from the online tutorial with the ones available in the source code of Torque (I'm using Torque 3.0.0 by the way). 

Unfortunately I am having another problem. 
I am able to run the checkpoint commands (qhold and qchkpt) without getting any error message but neither command creates a checkpoint file for the job. Actually the qhold command does not even stops the the running job.
The jobs are submitted with checkpointing enabled and a path for the checkpoint file.

This is the result of the tracejob of a qhold command:

03/16/2011 15:58:39  S    Holds u set at request of <some user>@cluster.PAC
03/16/2011 15:58:39  S    Job Modified at request of root at compute-0-1.local
03/16/2011 15:58:39  S    Holds uos released at request of root at compute-0-1.local

Does anybody has any idea why the checkpoint commands are not generating a checkpoint file and why qhold does not stop a running job?

Thank you.



More information about the torqueusers mailing list