[torqueusers] torque is working with openmpi?

Sergio Belkin sebelk at gmail.com
Tue Apr 17 15:12:15 MDT 2012


2012/4/17 Gus Correa <gus at ldeo.columbia.edu>:
> Hi Sergio
>
> A) Your OpenMPI seems to have built with Infinband support.
> However, as the error message say, you don't seem to have
> Infinband interfaces [or the openib kernel modules are not
> loaded].
>
> To prevent OpenMPI to use Infiniband,
> add '-mca btl ^openib'
> to your mpirun command line.
>
> A cleaner solution is to build OpenMPI with support only
> to the hardware that you have in your machines.

Thanks for the hint!

>
> **
>
> B) Also, to use the OpenMPI-Torque integration you must
> submit a job with *qsub*, not directly mpirun!
> Torque will assign a list of nodes that will be
> subsequently used by the mpirun *inside* the script that
> you submitted via qsub.
> This way you don't need to add a nodefile
> to the mpirun command line.
>
> For instance.
>
> Write a script like this [say my_script]:
> #PBS -l nodes=1:ppn=1
> #PBS -q batch
> #PBS -N hello
> cd $PBS_O_WORKDIR
> mpirun -np 2 ./hello
>
> Then do:
> qsub my_script

Thanks for your help I've got the idea, that's worked!

>
> **
>
> I hope this helps,
> Gus Correa
>
> On 04/17/2012 12:23 PM, Sergio Belkin wrote:
>> Hi,
>>
>> I'm testing torque on Fedora 16. The problem is that jobs are not sent to
>> Data:
>>
>>
>>
>> torque server: mpimaster.mycluster
>> torque client: mpinode02.mycluster
>>
>> [sergio at mpimaster cluster]$ ompi_info | grep tm
>>                   MCA ras: tm (MCA v2.0, API v2.0, Component v1.5.4)
>>                   MCA plm: tm (MCA v2.0, API v2.0, Component v1.5.4)
>>                   MCA ess: tm (MCA v2.0, API v2.0, Component v1.5.4)
>>
>>
>> torque configuration:
>>
>> [root at mpimaster sergio]# cat /etc/torque/pbs_environment
>> PATH=/bin:/usr/bin
>> LANG=C
>>
>> cat /etc/torque/server_name
>> mpimaster.mycluster
>>
>> [root at mpimaster sergio]# cat /etc/hosts
>> 127.0.0.1               localhost.localdomain localhost
>> ::1             localhost6.localdomain6 localhost6
>> 192.168.122.1   mpinode02.mycluster mpinode02
>> 192.168.122.2   mpimaster.mycluster mpimaster mpinode0
>>
>> cat /var/lib/torque/server_priv/nodes
>> mpimaster np=1
>> mpinode02 np=2
>>
>> [sergio at mpimaster ~]$ qmgr -c 'p s'
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue batch
>> #
>> create queue batch
>> set queue batch queue_type = Execution
>> set queue batch acl_user_enable = True
>> set queue batch acl_users = sergio
>> set queue batch resources_default.nodes = 2
>> set queue batch resources_default.walltime = 01:00:00
>> set queue batch enabled = True
>> set queue batch started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = mpimaster.mycluster
>> set server acl_hosts += mpimaster
>> set server acl_hosts += localhost
>> set server default_queue = batch
>> set server log_events = 511
>> set server mail_from = adm
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server next_job_number = 402
>> set server authorized_users = sergio at mpimaster
>> set server authorized_users += sergio at mpinode02
>>
>>
>> Client configuration:
>>
>> [sergio at mpimaster ~]$ cat /etc/hosts
>> 127.0.0.1               localhost.localdomain localhost
>> ::1             localhost6.localdomain6 localhost6
>> 192.168.122.1   mpinode02.mycluster mpinode02
>> 192.168.122.2   mpimaster.mycluster mpimaster mpinode01
>> Tiene correo nuevo en /var/spool/mail/sergio
>> [sergio at mpimaster ~]$ cat /etc/torque/
>> mom/             pbs_environment  sched/           server_name
>> [sergio at mpimaster ~]$ cat /etc/torque/server_name
>> mpimaster.mycluster
>> [sergio at mpimaster ~]$ cat /etc/torque/pbs_environment
>> PATH=/bin:/usr/bin
>> LANG=C
>> [sergio at mpimaster ~]$ cat /etc/torque/mom/config
>> # Configuration for pbs_mom.
>> $pbsserver mpimaster.mycluster
>>
>>
>> Then I submit job via mpirun
>>
>> [sergio at mpimaster cluster]$ mpirun  hello
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --------------------------------------------------------------------------
>> [[54064,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>    Host: mpimaster.mycluster
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>>
>>
>>
>> If I use hostfile works:
>>
>> [sergio at mpimaster cluster]$ mpirun --hostfile myhostfile hello
>>
>> KeyChain 2.6.8; http://www.gentoo.org/proj/en/keychain/
>> Copyright 2002-2004 Gentoo Foundation; Distributed under the GPL
>>
>>   * Found existing ssh-agent (1607)
>>   * Found existing gpg-agent (1690)
>>   * Known ssh key: /home/sergio/.ssh/id_rsa
>>
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --------------------------------------------------------------------------
>> [[54073,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>    Host: mpimaster.mycluster
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>> Hello World! from process 2 out of 3 on mpinode02.mycluster
>> Hello World! from process 1 out of 3 on mpinode02.mycluster
>> Hello World! from process 0 out of 3 on mpimaster.mycluster
>>
>> Am I doing something bad?
>>
>> Thanks in advance!
>>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



-- 
--
Sergio Belkin  http://www.sergiobelkin.com
Watch More TV http://sebelk.blogspot.com
LPIC-2 Certified - http://www.lpi.org


More information about the torqueusers mailing list