[torqueusers] can't run jobs over 62 nodes 2ppn on OS X

Glen Beane beaneg at umcs.maine.edu
Fri Sep 24 12:59:06 MDT 2004


I get Invalid argument errors in open_demux, every time I try to start 
a job using more than 62 nodes

I noticed that sysconf(_SC_OPEN_MAX) always returns 255 on my apples.  
I hacked mpiexec and pbs_demux to use 1024 instead of what is returned 
by sysconf(_SC_OPEN_MAX) always returns 255 but that didn't help.

09/24/2004 14:21:17;0008;   
pbs_mom;Job;1251.bender.bender.clusters.umaine.edu;JOIN JOB as node 63
09/24/2004 14:21:19;0008;   
pbs_mom;Job;1251.bender.bender.clusters.umaine.edu;start_process: task 
started, tid 128, sid 831, cmd /bin/sh
09/24/2004 14:21:21;0001;   pbs_mom;Svr;pbs_mom;Invalid argument (22) 
in open_demux, open_demux: connect 10.0.1.73:49512
09/24/2004 14:21:21;0001;   pbs_mom;Svr;pbs_mom;Invalid argument (22) 
in search_env_and_open, failed connect to mpiexec process on MS
09/24/2004 14:21:21;0001;   pbs_mom;Svr;pbs_mom;Invalid argument (22) 
in search_env_and_open, MPIEXEC_STDOUT_PORT=49512



Does anyone have any ideas?



More information about the torqueusers mailing list