[torqueusers] An issue when using pbs script to invoke different cpus from different nodes.

Hongsheng Zhao zhaohscas at yahoo.com.cn
Tue Oct 18 23:42:38 MDT 2011


On 10/18/2011 10:19 PM, Ken Nielson wrote:
>
>
> ----- Original Message -----
>> From: "Hongsheng Zhao"<zhaohscas at yahoo.com.cn>
>> To: torqueusers at supercluster.org
>> Sent: Monday, October 17, 2011 10:38:17 PM
>> Subject: [torqueusers] An issue when using pbs script to invoke different cpus from different nodes.
>>
>> Hi all,
>>
>> I use qsub to submit the job to my queue.   Currently I've the
>> following
>> lines in the pbs script invoked by qsub:
>>
>> --------
>> #PBS -l nodes=2:ppn=8
>> #PBS -l walltime=99:00:00
>> #PBS -j oe
>> #PBS -o out
>> #PBS -e err
>> #PBS -V
>> #PBS -q default
>> ----------
>>
>> As you can see, in the above example, the job will use 16 cpus
>> equally
>> supplied by two nodes.   But now, I want to let pbs assign the cpus
>> and
>> nodes to this job according to the following requirements:
>>
>> 1- There are 8 cpus used for this job.
>> 2- All of the these cpus may belong to one node, or can come from
>> different nodes, say,  supplied by two/three/four nodes and so on.
>>
>> Could you please give me some hints on this issue.  Thanks in
>> advance.
>>
>
> If you do not care what nodes the processors come from you could use -l procs=8.
>
> The procs option tells the scheduler to assign 8 processors from where ever it can find them.

According to you above notes, I've changed the pbs script snippet into 
the following form:

------------
##PBS -l nodes=1:ppn=8
#PBS -l procs=35
#PBS -l walltime=99:00:00
#PBS -j oe
#PBS -o out
#PBS -e err
#PBS -V
#PBS -q default
-------------

In the defaults queue, there are totally have 11 nodes with only one 
node has 16 cores and all the other nodes have 8 cores.

Then I run the vasp job via the above pbs script, and find the following 
output in the very beginning of the OUTCAR file:

-----------
  vasp.4.6.35 3Apr08 complex
  executed on             LinuxIFC date 2011.11.13  01:27:16
  running on    1 nodes
  distr:  one band on    1 nodes,    1 groups
-----------

It looks like that this job only use one node.  Could you please give me 
some more hints?  Thanks in advance.

For your information, I also give the complete content of my pbs scipt 
in the following:

**************** pbs script beginning from here *********************
zhaohongsheng at node32:~/work/work3/afm> cat vasp.job
#!/bin/bash
#
##PBS -l nodes=1:ppn=8
#PBS -l procs=35
#PBS -l walltime=99:00:00
#PBS -j oe
#PBS -o out
#PBS -e err
#PBS -V
#PBS -q default

source 
/public/software/intel/Compiler/11.1/059/bin/intel64/ifortvars_intel64.sh
source /public/software/intel/mkl/bin/intel64/mklvars_intel64.sh
source /public/software/intel/mpi/intel64/bin/mpivars.sh


# go to work dir
cd $PBS_O_WORKDIR

# The program we want to execute (modify to suit your setup)
EXEC=/public/software/vasp4.6
#EXEC=/share/apps/vasp/bin/vasp52_mkl1023029_impi322006

# setup mpd env (Of course use some other secret word than "dfadfs")
if [ ! -f ~/.mpd.conf ]; then
/bin/echo "secretword=dfadfs" >> ~/.mpd.conf
/bin/chmod 600 ~/.mpd.conf
fi


##########################################################
# The following should be no need to
#       change any of these settings for normal use.
##########################################################


# Intel MPI Home
MPI_HOME=/public/software/intel/mpi/intel64/bin


# setup Nums of Processor
NP=`cat $PBS_NODEFILE|wc -l`
echo "Numbers of Processors:  $NP"
echo "---------------------------"

# Number of MPD
N_MPD=`cat $PBS_NODEFILE|uniq|wc -l`
echo "started mpd Number: $N_MPD"
echo "---------------------------"

# setup mpi env (em64t)
$MPI_HOME/mpdboot -r ssh -n $N_MPD -f $PBS_NODEFILE


# running program
$MPI_HOME/mpiexec -genv I_MPI_DEBUG 3 -genv I_MPI_DEVICE ssm -n $NP $EXEC

# clean
$MPI_HOME/mpdallexit
**************** pbs script ended here *********************


Regards.


>
> Ken
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


-- 
Hongsheng Zhao <zhaohscas at yahoo.com.cn>
School of Physics and Electrical Information Science,
Ningxia University, Yinchuan 750021, China


More information about the torqueusers mailing list