[torqueusers] Checkpoint test fails: bug or misconfiguration?

Igor Volovichev salamanca at bk.ru
Thu Mar 19 06:07:40 MDT 2009


Hi, all

I use Torque-2.3.6 compiled with BLCR 0.8.0 enabled.

When testing as described in 
http://www.clusterresources.com/wiki/doku.php?id=torque:2.6_job_checkpoint_and_restart
test #6 fails. The problem is: whatever checkpoint file I choose (using qalter -W 
checkpoint_name=...), only the last one is used . And it is the filename that is submitted 
to restart_script. However "qstat -f" shows changes correctly. I see the attribute change in file 
/var/spool/torque/server_priv/jobs/xxxxxx.JB at host with server running,
but there is no change in /var/spool/torque/mom_priv/jobs/xxxxx.JB at host where pbs_mom is running - is it correct behavior?

Could someone advice some solution of the problem? Thanks.

WBR,
Igor



More information about the torqueusers mailing list