[torqueusers] can't get more than one processor per node

Donald E Tripp dtripp at hawaii.edu
Thu Jul 26 18:35:40 MDT 2007


I think I see your problem. You are using mpirun, which needs the host file specified, otherwise it will only run on the first node, but run the total number of processes requested. 

Same submit file, but for mpirun

---------------------------

#!/bin/bash
#PBS -u user_name
#PBS -l nodes=1:ppn=8
#PBS -o $PBS_JOBNAME.out
#PBS -e $PBS_JOBNAME.err

#How many procs do I have?
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')

#cd into the directory where I typed qsub
cd $PBS_O_WORKDIR

#run executable
mpirun -np $NP -machinefile $PBS_NODEFILE executable

---------------------------

Don

----- Original Message -----
From: "Adams, Samuel D Contr AFRL/HEDR" <Samuel.Adams at BROOKS.AF.MIL>
Date: Thursday, July 26, 2007 11:26 am
Subject: [torqueusers] can't get more than one processor per node
To: torqueusers at supercluster.org

> I have 8 cores per node.  I think I have torque configured to know 
> thateach node has 8 cores.
> ## from pbsnodes ##
> prodnode3.brooks.af.mil
>     state = job-exclusive
>     np = 8
>     ntype = cluster
>     jobs = 0/52.prodnode1.brooks.af.mil, 
> 1/52.prodnode1.brooks.af.mil,2/52.prodnode1.brooks.af.mil, 
> 3/52.prodnode1.brooks.af.mil,4/52.prodnode1.brooks.af.mil, 
> 5/52.prodnode1.brooks.af.mil, 6/52.prodnode1.brooks.af.mil, 
> 7/52.prodnode1.brooks.af.mil     status = opsys=linux,uname=Linux 
> prodnode3.brooks.af.mil2.6.18-8.1.4.el5 #1 SMP Thu May 17 03:16:52 
> EDT 2007
> x86_64,sessions=3059 3205 3237,nsessions=3,nusers=1,idletime=162
> ,totmem=18472040kb,availmem=18211688kb,physmem=16440432kb,ncpus=8,loadav
> e=0.67,netload=34049366,state=free,jobs=52.prodnode1.brooks.af.mil,recti
> me=1185484866
> 
> But, whenever I try to run a job some nodes, it only runs on one
> processor for some reason.  I have this basic test script that 
> contains:
> [sam at prodnode3 all]$ cat script.sh
> #PBS -l nodes=1
> `mpirun /home/sam/code/fdtd/fdtd_0.3/fdtd -t
> /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r
> /home/sam/code/fdtd/fdtd_0.3/test_files/sphere_brain_10_pad_x0110y0110z0
> 110.raw -v -f 2000 --pw 90,0,1,0`
> exit 0
> 
> I have tried running it with
> $ qsub script.sh
> $ qsub -l nodes=1:ppn=8 script
> 
> and I have tried changing the script to say mpirun -np 8 or mpiexec to
> no avail.
> 
> Does anyone know what I am doing wrong here?
> 
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 


More information about the torqueusers mailing list