[torqueusers] specific nodes
gus at ldeo.columbia.edu
Wed Nov 30 15:22:58 MST 2011
You don't have 8 CPUs of type 'uno'.
This seems to conflict with your mpirun command with -np=8.
You need to match the number of processors you request from Torque and
the number of processes you launch with mpirun.
Also, you wrote:
#PPS -q uno
Is this a typo in your email or in your Torque submission script?
It should be:
#PBS -q uno
In addition, your PBS script doesn't request nodes, something like
#PBS -l nodes=1:ppn=2
I suppose it will use the default for the queue uno.
However, your qmgr configuation doesn't set a default number of nodes to use,
either for the queues or for the server itself.
You could do:
qmgr -c 'set queue uno resources_default.nodes = 1'
and likewise for queue dos.
More important, is your mpi [and mpiexec] built with Torque support?
For instance, OpenMPI can be built with Torque support, so that it
will use the nodes provided by Torque to run the job.
However, stock packaged MPIs from yum or apt-get are probably not
integrated with Torque.
You would need to build it from source, which is not really hard.
If you use an mpi that is not integrated with Torque, you need to pass to mpirun/mpiexec
the file created by Torque with the node list.
The file name is held by the environment variable $PBS_NODEFILE.
The syntax vary depending on which mpi you are using, check your mpirun man page,
but should be something like:
mpirun -hostfile $PBS_NODEFILE -np 2 ./a.out
[ The flag may be -machinefile instead of -hostfile, or something else, depending on your MPI.]
On Nov 30, 2011, at 4:11 PM, Ricardo Román Brenes wrote:
> Ill post some more info since im pretty desperate right now :P
You should always do this, if you want help from the list.
Do you see how much more help you get when you give all the information? :)
I hope this helps,
> this is my nodes file:
> zarate-0 np=2 uno
> zarate-1 np=2 uno
> zarate-2 np=2 dos
> zarate-3 np=2 dos
> these are my queues:
> # Create queues and set their attributes.
> # Create and define queue uno
> create queue uno
> set queue uno queue_type = Execution
> set queue uno resources_default.neednodes = uno
> set queue uno enabled = True
> set queue uno started = True
> # Create and define queue dos
> create queue dos
> set queue dos queue_type = Execution
> set queue dos resources_default.neednodes = dos
> set queue dos enabled = True
> set queue dos started = True
> # Set server attributes.
> set server scheduling = True
> set server acl_hosts = zarate-0
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server next_job_number = 44
> and my maui.cfg:
> # maui.cfg 3.3
> SERVERHOST zarate-0
> ADMIN1 root
> RMCFG[zarate-0] TYPE=PBS
> AMCFG[bank] TYPE=NONE
> RMPOLLINTERVAL 00:00:30
> SERVERPORT 42559
> SERVERMODE NORMAL
> LOGFILE maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 3
> QUEUETIMEWEIGHT 1
> BACKFILLPOLICY FIRSTFIT
> RESERVATIONPOLICY CURRENTHIGHEST
> NODEALLOCATIONPOLICY MINRESOURCE
> ENABLEMULTIREQJOBS TRUE
> *note: running qmgr -c "p s" as a regular user non-root i get a different config display...
> so Im running this hellow.c mpi example, it just says hi from different nodes:
> #PBS -N hello_w
> #PPS -q uno
> /usr/local/bin/mpiexec -n 8 /home/rroman/a.out
> the output im expecting is that only the nodes with property "uno" should say hi but., this is the actual output:
> zarate-0: hello world from process 2 of 8
> zarate-2: hello world from process 3 of 8
> zarate-3: hello world from process 5 of 8
> zarate-1: hello world from process 0 of 8
> zarate-3: hello world from process 6 of 8
> zarate-1: hello world from process 7 of 8
> zarate-2: hello world from process 4 of 8
> zarate-1: hello world from process 1 of 8
> They all greet me... =(
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers