[torqueusers] Need help on torque submit job to multiple node

Ng, Theo [ITS] theo.ng at polyu.edu.hk
Tue Apr 2 04:15:54 MDT 2013


Dear All,

I am the beginner on HPC and facing a problem on only can use one computing node even though it has two nodes.

I have setup up one headnode (hpchead) and two computing node (hpchpd04 & hpchod05).  Each node has 2 cores CPU.  When I submit a job use 2 cores in 1 node, the job run well.  But when I submit for 4 codes, it fail.

Information as below:

[root at hpchead bin]# ./qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = hpchead
set server managers += root at hpchead
set server operators += root at hpchead
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server next_job_number = 91
set server moab_array_compatible = True

[root at hpchead bin]# ./pbsnodes -a
hpchpd04
     state = free
     np = 2
     ntype = cluster
     status = rectime=1364371489,varattr=,jobs=,state=free,netload=137663789,gres=,loadave=0.01,ncpus=2,physmem=6121356kb,availmem=5930100kb,totmem=6121356kb,idletime=242,nusers=0,nsessions=0,uname=Linux CA 2.6.32-220.el6.x86_64 #1 SMP Sat Dec 10 17:04:11 CST 2011 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     gpus = 0

hpchpd05
     state = free
     np = 2
     ntype = cluster
     status = rectime=1364371508,varattr=,jobs=,state=free,netload=217029487,gres=,loadave=0.00,ncpus=2,physmem=6121356kb,availmem=5926144kb,totmem=6121356kb,idletime=171230,nusers=0,nsessions=0,uname=Linux CB 2.6.32-220.el6.x86_64 #1 SMP Sat Dec 10 17:04:11 CST 2011 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     gpus = 0

[root at hpchead]# cat pbs_jobs_1n2c
#!/bin/bash
###
### Sample script for running MPI example for computing PI (Fortran 90 code)
###

### Set the job name
#PBS -N 1node-2core

### Run in the queue named "batch"
#PBS -q batch

### Specify the number of cpus for your job.  This example will allocate 4 cores
### using 2 processors on 1 node.
#PBS -l nodes=1:ppn=2

### Tell PBS the anticipated run-time for your job, where walltime=HH:MM:SS
#PBS -l walltime=0:10:00

### Specify stdout and stderr file
#PBS -o 1node-2core.stdout
#PBS -e 1node-2core.stderr

### Load needed modules here
. /etc/profile.d/modules.sh
module load compilers/open64/5.0
module load mpi/mpich2/3.0.2-open64-5.0

### Switch to the working directory; by default TORQUE launches processes
### from your home directory.
cd $PBS_O_WORKDIR
echo Working directory is $PBS_O_WORKDIR

# Calculate the number of processors allocated to this run.
NPROCS=`wc -l < $PBS_NODEFILE`

# Calculate the number of nodes allocated.
NNODES=`uniq $PBS_NODEFILE | wc -l`

### Display the job context
echo "Running on host `hostname` "
echo "Start Time is `date` "
echo "Directory is `pwd` "
echo "Using ${NPROCS} processors across ${NNODES} nodes "

mpirun -np 2 ./sample.out < file1 > output.1n2c.result

[root at hpchead]# cat pbs_jobs_2n2c
#!/bin/bash
###
### Sample script for running MPI example for computing PI (Fortran 90 code)
###

### Set the job name
#PBS -N 2node-2core

### Run in the queue named "batch"
#PBS -q batch

### Specify the number of cpus for your job.  This example will allocate 4 cores
### using 2 processors on 1 node.
#PBS -l nodes=2:ppn=2

### Tell PBS the anticipated run-time for your job, where walltime=HH:MM:SS
#PBS -l walltime=0:10:00

### Load needed modules here
. /etc/profile.d/modules.sh
module load compilers/open64/5.0
module load mpi/mpich2/3.0.2-open64-5.0

### Switch to the working directory; by default TORQUE launches processes
### from your home directory.
PBS_O_WORKDIR=`pwd`
cd $PBS_O_WORKDIR
echo Working directory is $PBS_O_WORKDIR

# Calculate the number of processors allocated to this run.
NPROCS=`wc -l < $PBS_NODEFILE`

# Calculate the number of nodes allocated.
NNODES=`uniq $PBS_NODEFILE | wc -l`

### Display the job context
echo "Running on host `hostname` "
echo "Start Time is `date` "
echo "Directory is `pwd` "
echo "Using ${NPROCS} processors across ${NNODES} nodes "

mpirun -np 4 ./sample.out < file1 > output.2n2c.result

echo "End time is `date` "

Would you have any suggestion?

Thank you very much.

Regards,
Theo


       ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        Disclaimer:

This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and the University immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful.

The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130402/39ab83da/attachment-0001.html 


More information about the torqueusers mailing list