[torqueusers] Setting up checkpointing

Mahmood Naderan nt_mahmood at yahoo.com
Thu Jan 26 02:14:55 MST 2012


If you are using debian based operating system, then you hardly can make BLCR working.
BLCR is primarily designed for redhat based operating systems.

 
// Naderan *Mahmood;


________________________________
 From: Lloyd Brown <lloyd_brown at byu.edu>
To: Torque Users Mailing List <torqueusers at supercluster.org> 
Sent: Thursday, January 19, 2012 2:09 AM
Subject: [torqueusers] Setting up checkpointing
 
Can anyone enlighten me on the current state of BLCR-style checkpointing
in Torque?  I've been trying to get it to work, and so far, I see that
it's invoking my checkpoint script, that script calls cr_checkpoint, and
the checkpoint files/directories are created, but something is calling
the mom_checkpoint_delete_files function, which in turn calls
delete_blcr_files, and the checkpoints get deleted.

Also, when I do a "qhold" on my job to try to initiate the checkpoint,
is it really supposed to terminate my job?  Perhaps that's related, eg.
the job is ending so the files get cleaned up.

Basically, does anyone have it working, and can give me advice?

Thanks,

-- 
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120126/5f4b1e88/attachment.html 


More information about the torqueusers mailing list