[torqueusers] osc mpiexec and torque4

Brock Palen brockp at umich.edu
Wed Jul 25 09:29:34 MDT 2012

The OSC mpiexec appears to have issues with torque 4.1.0  but works fine with 2.x

Has anyone gotten mpiexec (the popular tm aware launcher for mpich2 and mvapich) to work with torque 4?

I have some debugging information below:

[brockp at nyx7000 ~]$ /home/software/rhel6/mpiexec/bin/mpiexec -v -v -v ~/a.out
mpiexec: stat_exe: testing "/home/brockp/a.out".
mpiexec: resolve_exe: using absolute path "/home/brockp/a.out".
mpiexec: stdio_notice_streams: aggregate = 0 1 2.
mpiexec: concurrent_init: unix socket exists, trying to connect.
mpiexec: concurrent_init: old master died, reusing his fifo as master.
mpiexec: concurrent_init: i am concurrent master.
Segmentation fault

(gdb) where
#0  0x00000036afd31aff in __strlen_sse42 () from /lib64/libc.so.6
#1  0x00002aaaaaac53af in pbs_connect (server_name_ptr=0x0) at ../Libifl/pbsD_connect.c:1256
#2  0x0000000000405170 in get_hosts () at get_hosts.c:98
#3  0x0000000000403601 in main (argc=1, argv=0x7fffffffd890) at mpiexec.c:700

Line 1256 of pbsD_connect.c  is:
 strncat(server_name_list, pbs_get_server_list(),
     sizeof(server_name_list) -1 - strlen(server_name_ptr) - 1);

Examining server_name_list and server_name_ptr I get interesting results:

(gdb) x server_name_list
0x7fffffffc5f0:	0x00000000
(gdb) printf "%s", server_name_list
(nothing returned by gdb)

(gdb) x server_name_ptr
0x0:	Cannot access memory at address 0x0

The empty string of server_name_list and the cannot access memory appear strange to me, but I am not sure.

Brock Palen
CAEN Advanced Computing
brockp at umich.edu

More information about the torqueusers mailing list