[torqueusers] PBS_NODEFILE issue

Abraham Zamudio abraham.zamudio at gmail.com
Tue Apr 20 19:05:36 MDT 2010


debianclusters.cs.uni.edu/index.php/MPICH_with_Torque_Functionality


<http://debianclusters.cs.uni.edu/index.php/MPICH_with_Torque_Functionality>
On Tue, Apr 20, 2010 at 1:04 PM, Si Hammond <simon.hammond at gmail.com> wrote:

> Hi,
>
> We're running 2.4.7 and I can cat the $PBS_NODEFILE in both -l
> nodes=2:ppn=2 and -l nodes=1:ppn=2 configurations (i.e. works for me fine).
>
> If you have built OpenMPI with --with-tm then you shouldn't need to specify
> the node file right? The runtime picks this up from the PBS engine during
> execution?
>
> Have you tried just a basic mpirun ./pingpong or something like that?
>
>
>
>
> S.
>
>
> On 20 Apr 2010, at 13:57, alap pandya wrote:
>
> Hi,
>
> I am facing issue while running job on multiple nodes on torque . Please
> give me your suggestion.
>
>
> Issue :
> When i changed  *#PBS -l nodes=1:ppn=2  ----> * *#PBS -l nodes=2:ppn=2* in
> script , PBS_NODEFILE is not created and finally not able to run job.
>
> Note : similar issues mentioned at
>          *
> http://www.clusterresources.com/pipermail/torqueusers/2006-October/004434.html
>
> http://www.clusterresources.com/pipermail/torqueusers/2010-January/009890.html
> *
>
> *Torque : 2.4.6 *
>
> 1> Running fine with single node.
>
> #!/bin/sh
> *#PBS -l nodes=1:ppn=2*
> echo "HOSTNAME : $HOSTNAME"
> echo "PBS_NODEFILE = $PBS_NODEFILE"
> cd /disk
> #echo $PBS_NODEFILE > shreenivas
> cat $PBS_NODEFILE > pbsnodes
> mpirun --hostfile $PBS_NODEFILE ./job1_100
>
>
> *[root at cluster disk]# cat pbsnodes
> cluster.hpc.org
> cluster.hpc.org
>
> *job is running fine with 2 processes on single node.
>
> 2> changed *#PBS -l nodes=1:ppn=2  ----> * *#PBS -l nodes=2:ppn=2* .....
>
> #!/bin/sh
> *#PBS -l nodes=2:ppn=2*
> echo "HOSTNAME : $HOSTNAME"
> echo "PBS_NODEFILE = $PBS_NODEFILE"
> cd /disk
> cat $PBS_NODEFILE > pbsnodes
> mpirun --hostfile $PBS_NODEFILE ./job1_100
>
> *[root at cluster disk]# cat pbsnodes
> ***there is no file created this time .....something strange ...no mpi job
> is running on any nodes(compute-0-5,cluster) as shown in *tracejob* output
> mentioned below. .
>
> *tracejob output :*
>
> 04/20/2010 18:04:14  S    enqueuing into test, state 1 hop 1
> 04/20/2010 18:04:14  S    Job Queued at request of root at cluster, owner =
> root at cluster, job name
>                           = a.sh, queue = test
> 04/20/2010 18:04:14  S    Job Run at request of root at cluster
> 04/20/2010 18:04:14  A    queue=test
> 04/20/2010 18:04:14  A    user=root group=root jobname=a.sh queue=test
> ctime=1271766854
>                           qtime=1271766854 etime=1271766854
> start=1271766854 owner=root at cluster
>                           exec_host=compute-0-5/2+compute-0-5/1+
> cluster.hpc.org/2+cluster.hpc.org/1
>                           Resource_List.neednodes=2:ppn=2
> Resource_List.nodect=2
>                           Resource_List.nodes=2:ppn=2
> Resource_List.walltime=01:00:00 *
>
> ...............................This sequence repeats many time as there is
> no *PBS_NODEFILE created. MPI is not able to get nodelist.
>
>
> With regards,
> Alap
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> ---------------------------------------------------------------------------------------
> Si Hammond
>
> Research & Knowledge Transfer Associate
> Performance Modelling, Analysis and Optimisation Team
> High Performance Systems Group
> Department of Computer Science
> University of Warwick, CV4 7AL, UK
> http://go.warwick.ac.uk/hpsg
>
> ----------------------------------------------------------------------------------------
>
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
Abraham Zamudio Ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100420/574e7649/attachment.html 


More information about the torqueusers mailing list