[torqueusers] torque is working with openmpi?

Sergio Belkin sebelk at gmail.com
Tue Apr 17 10:23:24 MDT 2012


Hi,

I'm testing torque on Fedora 16. The problem is that jobs are not sent to
Data:



torque server: mpimaster.mycluster
torque client: mpinode02.mycluster

[sergio at mpimaster cluster]$ ompi_info | grep tm
                 MCA ras: tm (MCA v2.0, API v2.0, Component v1.5.4)
                 MCA plm: tm (MCA v2.0, API v2.0, Component v1.5.4)
                 MCA ess: tm (MCA v2.0, API v2.0, Component v1.5.4)


torque configuration:

[root at mpimaster sergio]# cat /etc/torque/pbs_environment
PATH=/bin:/usr/bin
LANG=C

cat /etc/torque/server_name
mpimaster.mycluster

[root at mpimaster sergio]# cat /etc/hosts
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.122.1   mpinode02.mycluster mpinode02
192.168.122.2   mpimaster.mycluster mpimaster mpinode0

cat /var/lib/torque/server_priv/nodes
mpimaster np=1
mpinode02 np=2

[sergio at mpimaster ~]$ qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch acl_user_enable = True
set queue batch acl_users = sergio
set queue batch resources_default.nodes = 2
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = mpimaster.mycluster
set server acl_hosts += mpimaster
set server acl_hosts += localhost
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 402
set server authorized_users = sergio at mpimaster
set server authorized_users += sergio at mpinode02


Client configuration:

[sergio at mpimaster ~]$ cat /etc/hosts
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.122.1   mpinode02.mycluster mpinode02
192.168.122.2   mpimaster.mycluster mpimaster mpinode01
Tiene correo nuevo en /var/spool/mail/sergio
[sergio at mpimaster ~]$ cat /etc/torque/
mom/             pbs_environment  sched/           server_name
[sergio at mpimaster ~]$ cat /etc/torque/server_name
mpimaster.mycluster
[sergio at mpimaster ~]$ cat /etc/torque/pbs_environment
PATH=/bin:/usr/bin
LANG=C
[sergio at mpimaster ~]$ cat /etc/torque/mom/config
# Configuration for pbs_mom.
$pbsserver mpimaster.mycluster


Then I submit job via mpirun

[sergio at mpimaster cluster]$ mpirun  hello
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--------------------------------------------------------------------------
[[54064,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: mpimaster.mycluster

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------



If I use hostfile works:

[sergio at mpimaster cluster]$ mpirun --hostfile myhostfile hello

KeyChain 2.6.8; http://www.gentoo.org/proj/en/keychain/
Copyright 2002-2004 Gentoo Foundation; Distributed under the GPL

 * Found existing ssh-agent (1607)
 * Found existing gpg-agent (1690)
 * Known ssh key: /home/sergio/.ssh/id_rsa

librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--------------------------------------------------------------------------
[[54073,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: mpimaster.mycluster

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Hello World! from process 2 out of 3 on mpinode02.mycluster
Hello World! from process 1 out of 3 on mpinode02.mycluster
Hello World! from process 0 out of 3 on mpimaster.mycluster

Am I doing something bad?

Thanks in advance!

-- 
--
Sergio Belkin  http://www.sergiobelkin.com
Watch More TV http://sebelk.blogspot.com
LPIC-2 Certified - http://www.lpi.org


More information about the torqueusers mailing list