[torqueusers] PBS in Cluster
akohlmey at cmm.chem.upenn.edu
Mon Feb 1 15:33:25 MST 2010
On Mon, Feb 1, 2010 at 5:12 PM, Tim <timlee126 at yahoo.com> wrote:
> Thanks, Axel!
> Are you saying that if submit several background jobs, for better control of assignment of resources via PBS commands, better not to put them into a single PBS file and submit the file just once, but to put each job in different PBS files and submit the PBS files one by one?
yes. unless your local cluster has certain restrictions. on the
machine, that i manage, users are encouraged to always
use "full" nodes, i.e. multiple of 8 cores (e.g. via -l nodes=1:ppn=8)
and then it would make sense to put (up to 8 jobs)
into one submit. the trick with using wait is important, or else your
jobs will be immediately killed, since the job will
terminate when the qsub script terminates. nevertheless, i rather
recommend people to use an appfile with OpenMPI
instead of backgrounding jobs. this has more control (one can also
"package" multiple parallel jobs, e.g. 4x 2-MPI tasks
in the above example) and would also allow to scatter jobs across
multiple nodes. if you want to get really fancy, you
can write your own wrapper that you give a long list of command lines
and then it would execute them on the next
free node in your reservation. in general, this is not a good idea, it
only helps if you need to deal with jobs that
have relatively short execution times, but you run on a heavily used
machine where large parallel jobs are favored.
> If yes, I think I do not need to put the jobs in background, since the reason I want to background them is because I have these several jobs in a single file and I want to run them in parallel instead of one start after another finishes.
yes. with the one caveat from above (on our cluster the performance of
a single job can be affected up to
a factor of two by a second job of an special kind on the same node
being present or not, hence to keep
execution time predictable i request users to always reserve full
nodes), it is better to let the batch scheduler
do the work. particularly maui/moab are doing a pretty good job of
packaging smaller jobs into the "holes"
that larger jobs leave through "backfilling". for that, however, it is
importan to submit your job with a good
guess (plus safety) of how long it will execute.
because of these "rules" our local cluster tend to have average
utilization per month of over 90%.
Dr. Axel Kohlmeyer akohlmey at gmail.com
Institute for Computational Molecular Science
College of Science and Technology
Temple University, Philadelphia PA, USA.
More information about the torqueusers