[torqueusers] Problems with suspend/resume

Fedele Stabile fedele at fis.unical.it
Thu Apr 14 12:26:52 MDT 2005


With PBSPro, i have developed a procedure that suspends and resume mpi
jobs not using mpiexec. It's correctly working by one year and simulate
a so called gang scheduling: you can submit different jobs to a queue
and give them a running time slice of 1 hour (for example) suspending
each job and resuming when it needs to run. 

Now i would like to migrate to torque/maui this procedure.

Fedele
  
Il gio, 2005-04-14 alle 14:37, Richard Walsh ha scritto:
> Gerson Galang wrote:
> 
> > Benward Platz has written a patch to req_signal.c which fixes this 
> > problem.
> >
> > http://www.supercluster.org/pipermail/mauiusers/2004-July/001284.html
> >
> All,
> 
> Freeing the nodes for other work make sense of course, but what impact 
> does that
> have on a later qsig -s resume ... the suspend/resume pair are mentioned 
> as only
> supported in the Cray Unicos environment.  Will resumed jobs that had been
> release be reattached and monitored by the pbs_mom ... and what if 
> another job is
> running?  It seems like the situation is a bit more complicated.
> 
> Regards,
> 
> Richard Walsh
> AHPCRC
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list