[torqueusers] Enabling BLCR on Torque roll (problems with "qhold")
Ricardo Alves
rdq.alves at gmail.com
Wed Mar 16 10:08:32 MDT 2011
Thank you for the help.
I was able to solve my problem, with the pbs_mom error and replaced the blcr scripts from the online tutorial with the ones available in the source code of Torque (I'm using Torque 3.0.0 by the way).
Unfortunately I am having another problem.
I am able to run the checkpoint commands (qhold and qchkpt) without getting any error message but neither command creates a checkpoint file for the job. Actually the qhold command does not even stops the the running job.
The jobs are submitted with checkpointing enabled and a path for the checkpoint file.
This is the result of the tracejob of a qhold command:
03/16/2011 15:58:39 S Holds u set at request of <some user>@cluster.PAC
03/16/2011 15:58:39 S Job Modified at request of root at compute-0-1.local
03/16/2011 15:58:39 S Holds uos released at request of root at compute-0-1.local
Does anybody has any idea why the checkpoint commands are not generating a checkpoint file and why qhold does not stop a running job?
Thank you.
More information about the torqueusers
mailing list