[torqueusers] Submission script: Requesting cpus rather than nodes
rlinesseagate at gmail.com
Mon Jul 28 15:37:49 MDT 2008
We are running into a problem with our nodes as we have a wide mix of users
between people that have single cpu jobs and need to run 1000s of them each
time they run and other users that only have one job but that needs a larger
number of cpus. We are running into the single jobs being schedualed and the
multi node/cpu jobs taking forever to get schedualed because they can't find
the required number of nodes with X cpus available. In our case we have 48
nodes with 45 of them having Infiniband for MPI and our mpi jobs are 40 to
64 cores. We would like to have a way to just ask for 40 or 64 cores. The
64 one dies when you ask for 64 nodes so the work around had been to ask for
16 nodes with ppn=4 but we don't end up with 16 nodes completely empty
hardly ever as we have some single cpu jobs that have run for a week or
better and we had our NODEALLOCATEIONPOLICY set to CPULOAD but that results
in single cpu jobs being spread out across lots of nodes so it takes a while
before it before they become free.
So we are looking for a way to request cpus(cores) rather than cpus per
machine because the simulations could just as easily be spread out through
all the IB nodes. I could not find anything in the docs on how to do that.
We are using Torque with the Maui Schedualer. If anyone has a suggestion on
a good configuration for a cluster that has a wide mix of job types and also
has some applications running on the cluster that are outside of torque or
can point me at one that would work I would appreciate it.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers