[torqueusers] Problems with suspend/resume
fedele at fis.unical.it
Thu Apr 14 12:26:52 MDT 2005
With PBSPro, i have developed a procedure that suspends and resume mpi
jobs not using mpiexec. It's correctly working by one year and simulate
a so called gang scheduling: you can submit different jobs to a queue
and give them a running time slice of 1 hour (for example) suspending
each job and resuming when it needs to run.
Now i would like to migrate to torque/maui this procedure.
Il gio, 2005-04-14 alle 14:37, Richard Walsh ha scritto:
> Gerson Galang wrote:
> > Benward Platz has written a patch to req_signal.c which fixes this
> > problem.
> > http://www.supercluster.org/pipermail/mauiusers/2004-July/001284.html
> Freeing the nodes for other work make sense of course, but what impact
> does that
> have on a later qsig -s resume ... the suspend/resume pair are mentioned
> as only
> supported in the Cray Unicos environment. Will resumed jobs that had been
> release be reattached and monitored by the pbs_mom ... and what if
> another job is
> running? It seems like the situation is a bit more complicated.
> Richard Walsh
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers