[torqueusers] question on scheduling

Chris Samuel csamuel at vpac.org
Sun Oct 30 15:20:06 MST 2005


On Fri, 28 Oct 2005 05:04 pm, lorenzo118 at interfree.it wrote:

> or if it would be possible to assign an higher priority to the little job
> and temporary "freeze" the big one

This sounds like the Torque/Maui concept of suspend and resume. Unfortunately 
whilst it works for single CPU jobs there can be problems for MPI jobs if the 
program used to launch the MPI job doesn't forward suspend signals properly.

So, if you're using MPICH's mpirun with SSH or using the Pete's mpiexec 
program from OSC they currently do not do so, although there have been 
patches in the past to enable that and they are currently working out the 
best way to implement support in the mainline sources.

No idea about LAM, that may well work..

So, at the moment if you try and suspend an MPI job you are more than likely 
to only succeed in suspending the processes on the lead node of the job (the 
"mother superior" or "MS" in PBS speak) and the rest will more than likely 
keep on running..

Good luck!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051031/78de2cf6/attachment.bin


More information about the torqueusers mailing list