[torqueusers] Fwd: Cluster Questions

Ricardo Román Brenes roman.ricardo at gmail.com
Tue Nov 29 08:45:08 MST 2011


Hello everyone thanks for the time of reading and the long post :P


The question is about multiple queues with Torque:


We have here different clusternodes with difrente architectures:
4 PS-3
3 CPU+GPU
2 CPU

and i want to be able to send jobs to each of hte nodes independly (using
torque). Im guessing that having several queues and that each node
belonging to a queue in particular and then submittint jobs to that queue
will do the trick:

say i got 4 queues
IBMCELL with the 4 PS-3
TESLA with the 3 nodes that have GPUs
XEON with te 5 nodes that have Xeons (which in turn 3 of them have teslas
:P)

and when i submit a job:
qsub -q IBMCELL a.pbs
should run on the PS-3 only, but im not being able to make it work like
that.

As a test i made 2 queues in the PS3 pbs_server ("uno" and "dos"):

#
> # Create queues and set their attributes.
> #
> #
> # Create and define queue uno
> #
> *create queue uno
> **set queue uno queue_type = Execution
> **set queue uno acl_host_enable = False
> **set queue uno acl_hosts = zarate-0+zarate-1
> **set queue uno enabled = True
> **set queue uno started = True
> *#
> # Create and define queue dos
> #
> *create queue dos
> **set queue dos queue_type = Execution
> **set queue dos acl_host_enable = **False**
> **set queue dos acl_hosts = zarate-2+zarate-3
> **set queue dos enabled = True
> **set queue dos started = True
> *#
> # Set server attributes.
> #
> set server scheduling = True
> set server acl_hosts = zarate-0
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server next_job_number = 22


and i changed the *nodes* file in the server_priv directory so it is like
this (zarate are just the hostname :P):

>
> zarate-0 np=2 uno
> zarate-1 np=2 uno
> zarate-2 np=2 dos
> zarate-3 np=2 dos



but its not working... when i launch a job:

#PBS -N mpi_hello
> /usr/local/bin/mpiexec -n 8 /home/rroman/a.out


with teh command:

#PBS -N mpi_hello

/usr/local/bin/mpiexec -n 8 /home/rroman/a.out


the output file is:

zarate-1: hello world from process 2 of 8
> zarate-2: hello world from process 5 of 8
> zarate-2: hello world from process 6 of 8
> zarate-3: hello world from process 0 of 8
> zarate-3: hello world from process 7 of 8
> zarate-1: hello world from process 3 of 8
> zarate-0: hello world from process 4 of 8
> zarate-3: hello world from process 1 of 8



And there it shows that the job is running in ALL the nodes instead of
running only in zarate-0 and zarate-1 as the queue said (according to me :P)




SO! the question is: is it possible to do waht i want like this? and if so,
what am i doing wrong! :P

Thank you Kay!

-ricardo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20111129/ab76ec9c/attachment.html 


More information about the torqueusers mailing list