[torqueusers] question on scheduling
csamuel at vpac.org
Sun Oct 30 15:20:06 MST 2005
On Fri, 28 Oct 2005 05:04 pm, lorenzo118 at interfree.it wrote:
> or if it would be possible to assign an higher priority to the little job
> and temporary "freeze" the big one
This sounds like the Torque/Maui concept of suspend and resume. Unfortunately
whilst it works for single CPU jobs there can be problems for MPI jobs if the
program used to launch the MPI job doesn't forward suspend signals properly.
So, if you're using MPICH's mpirun with SSH or using the Pete's mpiexec
program from OSC they currently do not do so, although there have been
patches in the past to enable that and they are currently working out the
best way to implement support in the mainline sources.
No idea about LAM, that may well work..
So, at the moment if you try and suspend an MPI job you are more than likely
to only succeed in suspending the processes on the lead node of the job (the
"mother superior" or "MS" in PBS speak) and the rest will more than likely
keep on running..
Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051031/78de2cf6/attachment.bin
More information about the torqueusers