[torqueusers] Problem with running jobs requesting multiple nodes
gus at ldeo.columbia.edu
Fri Oct 25 10:34:14 MDT 2013
On 10/24/2013 10:51 PM, Jack Hill wrote:
> On Thu, 24 Oct 2013, Gus Correa wrote:
>> Hello Jack
>> Have you tried Maui instead of pbs_sched?
>> See these threads:
>> Gus Correa
> Thanks for the pointer. We had seen Maui and were keeping in the back of
> our minds in case we needed the implement a more complex policy. We had
> been thinking that it would be better to not add complexity when we didn't
> need it, but maybe we do :)
> To be clear about what is going on here: there is some bug in pbs_sched,
> that would be work around by using Maui (i.e. it's not a problem with
> pbs_server), but since Maui works an is more capable, it is not worth my
> time to figure out what the problem is.
I also don't advocate adding complexity where it is not needed.
I used pbs_sched for a long time in our older clusters.
It is good for testing torque also (when it works).
I suggested Maui because it will probably work with Torque 4.X.Y.Z
whereas pbs_sched seems to be on the way to be fixed in a
future 4.X.Y.Z release. (See the threads I sent before.)
Is pbs_sched fixed in the upcoming 4.2.6 release?]
An alternative would be to move back to Torque 2.5.X,
or 2.4.X, and use pbs_sched.
However, in my humble opinion, re-installing Torque is more painful
in a production cluster than switching to another scheduler.
If your MPI uses the Torque libraries, you may have to reinstall it.
Maui installs easily.
You just need to configure it
(and maybe --prefix=/wherever/you/want;
Since you don't need a complex policy
(we still don't need here either),
you could use the boilerplate maui.cfg file,
or modify only a few items (which is what I do here).
I hope this helps,
More information about the torqueusers