[torqueusers] question about creating checkpoint

Mahmood Naderan nt_mahmood at yahoo.com
Wed Mar 9 00:52:14 MST 2011


>I do not know if it you can checkpoint the job.  The checkpointing is  done by 
>the script listed in your mom config file as  $checkpoint_script.  >This script, 
>if not modified, will be using the  cr_checkpoint command installed by BLCR.  
>The man page for this command  lists several ways >that the job can be started 
>so it is checkpointable.   If the -c had been used on the qsub then Torque would 
>have run the job  using the cr_run >command from BLCR, but since -c was not 
>specified  Torque just ran the job normally.  The only way that the  
>$checkpoint_script and/or >cr_checkpoint might work is if the job was  
>originally linked with one of the libraries listed in the man page for  
>cr_checkpoint.

 
Thanks however i have many problems with BLCR which doubt if it work with ubuntu 
server based systems.

// Naderan *Mahmood;



----- Original Message ----
From: Al Taufer <ataufer at adaptivecomputing.com>
To: Torque Users Mailing List <torqueusers at supercluster.org>
Sent: Mon, March 7, 2011 9:08:21 PM
Subject: Re: [torqueusers] question about creating checkpoint

----- Original Message -----
> Hi,
> Is it possible to manually create a checkpoint despite the fact that
> no "-c" was
> used in the submission? A user forgot to use "-c" when he used "qsub"
> and now we
> want to create a checkpoint and sutdown the server.
> 

I do not know if it you can checkpoint the job.  The checkpointing is done by 
the script listed in your mom config file as $checkpoint_script.  This script, 
if not modified, will be using the cr_checkpoint command installed by BLCR.  The 
man page for this command lists several ways that the job can be started so it 
is checkpointable.  If the -c had been used on the qsub then Torque would have 
run the job using the cr_run command from BLCR, but since -c was not specified 
Torque just ran the job normally.  The only way that the $checkpoint_script 
and/or cr_checkpoint might work is if the job was originally linked with one of 
the libraries listed in the man page for cr_checkpoint.

Al Taufer

> 
> Thanks for any feedback.
> // Naderan *Mahmood;
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers



      


More information about the torqueusers mailing list