[torqueusers] Setting up checkpointing
nt_mahmood at yahoo.com
Thu Jan 26 02:14:55 MST 2012
If you are using debian based operating system, then you hardly can make BLCR working.
BLCR is primarily designed for redhat based operating systems.
// Naderan *Mahmood;
From: Lloyd Brown <lloyd_brown at byu.edu>
To: Torque Users Mailing List <torqueusers at supercluster.org>
Sent: Thursday, January 19, 2012 2:09 AM
Subject: [torqueusers] Setting up checkpointing
Can anyone enlighten me on the current state of BLCR-style checkpointing
in Torque? I've been trying to get it to work, and so far, I see that
it's invoking my checkpoint script, that script calls cr_checkpoint, and
the checkpoint files/directories are created, but something is calling
the mom_checkpoint_delete_files function, which in turn calls
delete_blcr_files, and the checkpoints get deleted.
Also, when I do a "qhold" on my job to try to initiate the checkpoint,
is it really supposed to terminate my job? Perhaps that's related, eg.
the job is ending so the files get cleaned up.
Basically, does anyone have it working, and can give me advice?
Fulton Supercomputing Lab
Brigham Young University
torqueusers mailing list
torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers