[torqueusers] Checkpoint test fails: bug or misconfiguration?
Igor Volovichev
salamanca at bk.ru
Thu Mar 19 06:07:40 MDT 2009
Hi, all
I use Torque-2.3.6 compiled with BLCR 0.8.0 enabled.
When testing as described in
http://www.clusterresources.com/wiki/doku.php?id=torque:2.6_job_checkpoint_and_restart
test #6 fails. The problem is: whatever checkpoint file I choose (using qalter -W
checkpoint_name=...), only the last one is used . And it is the filename that is submitted
to restart_script. However "qstat -f" shows changes correctly. I see the attribute change in file
/var/spool/torque/server_priv/jobs/xxxxxx.JB at host with server running,
but there is no change in /var/spool/torque/mom_priv/jobs/xxxxx.JB at host where pbs_mom is running - is it correct behavior?
Could someone advice some solution of the problem? Thanks.
WBR,
Igor
More information about the torqueusers
mailing list