[torquedev] torque+blcr+openmpi

Danny Sternkopf dsternkopf at hpce.nec.com
Tue Jun 29 03:25:54 MDT 2010


Eric,

thanks for your answer.

Please find my scripts attached. They are quite simple at the moment. 
They use ompi-checkpoint and ompi-restart for OpenMPI apps. They must 
use it at the moment. For Multi-node jobs Torque starts a pbs_demux 
process for capturing stdout/err of all MOMs which is not 
checkpointable. Probably because it is not started with cr_run or under 
similiar BLCR environment. Therefore cr_checkpoint --tree doesn't work 
for these kind of jobs.

I do not have a deep knowledge what exactly ompi-checkpoint does. But I 
can imagine that it brings the MPI app in a proper state, ready for 
checkpointing. Therefore I'am nore sure if you could succesfully 
checkpoint/restart the MPI app without the knowledge about the app 
requirements. For example communication is one of the major challenges 
which plays a important role in that context.

In general one would expect that the batch system sends a signal to MPI 
which then brings the app in a proper state for checkpointing and then 
the batch system just performs checkpoints for all the involved 
processes uses a general format. This could be one approach.

However thats why ompi-checkpoint/ompi-restart exist and which take care 
of of this and which can make use of BLCR tools of course.

As I said the checkpinting works for me because ompi-checkpoint does a 
good job and checkpoints, then terminates the MPI related processes. 
Torque takes care of pbs_demux and the batch job script.

The restart does not work. I don't see that the blcr_restart_script is 
called. So I guess Torque has a problem to find the expected checkpoint 
file.

The following article gives useful information about the current 
Torque/BLCR integration:
  http://www.clusterresources.com/pipermail/torquedev/2010-May/002054.html

Regards,

Danny

On 6/28/2010 8:08 PM, Eric Roman wrote:
>
> Danny,
>
> I worked on this a while ago, but it's been a long standing todo item to get
> everything to work properly.
>
> Can you tell me what your scripts do?
>
> Can you restart the application manually from the context file you created?
> (With cr_restart?)
>
> Tormally, torque tries to checkpoint the shell it spawned assoc'd with the job
> using cr_checkpoint (--tree) to capture all of the children, including the
> mpirun and the MPI ranks.  Last time I checked, mpirun wouldn't respond to a
> cr_checkpoint.  (I think it omitted itself from the checkpoint, but I don't
> remember).  openmpi required a user to invoke ompi-checkpoint to checkpoint an
> app, and ompi-restart to bring the app back, but torque wants to use
> cr_checkpoint and cr_restart on the context file.  So, I needed to wrap
> the original openmpi mpirun with another program that would intercept the
> checkpoint signals.
>
> Part of the problem is that openmpi puts some of the MPI rank into the same
> process tree (or session) as the mpirun, and this messes everything up.  I
> left off at the point where I needed to write startup code to ensure that
> the ranks were in a separate process tree from the mpirun.  (The way things
> are implemented right now, the checkpoint deadlocks, so we need to break
> one of the dependencies to fix it.)
>
> The root issue is a little bit messy.  Those checkpoint/restart scripts
> need root privileges to open the context file.  Those scripts need to open
> the context file (as root), and then call setuid() to change into the user,
> making sure that they pass the context file as a file descriptor to
> cr_checkpoint and cr_restart.
>
> I do want to go in and fix all of this.  Right now I'm trying to get BLCR to
> work with compressed context files, and chasing a bug with using it on
> the 2.6.33 kernel.
>
> Eric
>
>
>
> On Mon, Jun 28, 2010 at 09:43:14AM +0200, Danny Sternkopf wrote:
>> Hi,
>>
>> maybe someone here can comments on this.
>>
>> Regards,
>>
>> Danny
>>
>> -------- Original Message --------
>> Subject: Re: [torqueusers] torque+blcr+openmpi
>> Date: Fri, 25 Jun 2010 16:58:59 +0200
>> From: Danny Sternkopf<dsternkopf at hpce.nec.com>
>> Reply-To: dsternkopf at hpce.nec.com
>> Organization: NEC Deutschland GmbH
>> To: torqueusers at supercluster.org
>>
>> Hi,
>>
>> any news about this? I have the following setup:
>> o torque 2.4.8
>> o openmpi 1.4.2
>> o blcr 0.8.2
>>
>> The checkpoint/restart scripts from Torque's contrib/blcr work for
>> single node application without MPI. I created new scripts for OpenMPI
>> applications. The checkpoint works, but the release does not. The issue
>> might be that ompi-checkpoint writes a directory including checkpoint
>> files for each process plus metadata and Torque expects one single
>> checkpoint file. Any experiences?
>>
>> Btw another issue is that the checkpoint/restart scripts run as root.
>> ompi-checkpoint doesn't allow that root can checkpoint user jobs. So you
>> have to run the ompi-checkpoint as user. The restart script of course
>> needs this as well to restart process under the corresponding user id.
>>
>> Furthermore any comments to handle MPI and single process applications
>> with same checkpoint/restart scripts?
>>
>> Regards,
>>
>> Danny
>> ---
>> _______________________________________________
>> torquedev mailing list
>> torquedev at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torquedev
>

-- 
Danny Sternkopf http://www.nec.de/hpc        dsternkopf at hpce.nec.com
HPCE Division  Germany phone: +49-711-78055-33 fax: +49-711-78055-25
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NEC Deutschland GmbH, Hansaallee 101, 40549 Düsseldorf
Geschäftsführer Richard Hanscott
Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: blcr_restart_script.sh
Url: http://www.supercluster.org/pipermail/torquedev/attachments/20100629/38d655ff/attachment.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: blcr_checkpoint_script.sh
Url: http://www.supercluster.org/pipermail/torquedev/attachments/20100629/38d655ff/attachment-0001.pl 


More information about the torquedev mailing list