[torqueusers] mixed 32/64 bit cluster

Torsten Rohlfing torsten at synapse.sri.com
Sat Oct 25 12:51:53 MDT 2008


> i have compiled the 32 bit mom on the 64  bit arch.  everything seemed to
> complete properly.  however, now the pbs_mom segfaults with an error 7 when
> i try to start it up.  also,  i am running a 32 bit server on the head.
> will this cause communication problems with the 64 bit moms on the other
> nodes?  i read that they could not communicate with each other properly.
>
>   
Out of curiosity -- where did you read about communication problems 
between 32bit server and 64bit mom? And what version of torque are you 
using? Are you sure that your setup would work on a pure 32bit or pure 
64bit cluster? Maybe your mom crashing is unrelated to 32/64 bit issues.

Our setup is obviously opposite to yours, 64bit server with some 32bit 
clients, but at least that works fine (our installation uses 
torque-2.1.10-5.fc9).
>  the 64 bit compilation of moms are up and running on the other nodes but no
> jobs are pushed out to them.  i am assuming from your response that this is
> because of some library dependencies?  does torque check job dependencies or
> is the job pushed out and then rejected when the job itself determines that
> it does not have the required libs?  
When your job script invokes a binary with missing libraries, the same 
thing happens that occurs when you run such binaries from the command 
line: you get an error message about missing libs and the binary 
terminates. You should see that in the job's stderr log file. In the 
alternative, you might just want to run a job directly via command line 
on the compute nodes. If it doesn't run directly from the shell, chances 
are it won't run through torque either.

If it runs from shell bot not torque, see if your interactive shell sets 
LD_LIBRARY_PATH to search for shared libraries in non-standard places. 
You should then make sure the same is set in your job when you submit it 
to torque (e.g., put an "export LD_LIBRARY_PATH=path:path:path...." at 
the beginning of your job script).
> if that is indeed the case, how does
> one find the library requirements for jobs that i do not compile or submit?
>   

-- 
Torsten Rohlfing, PhD          SRI International, Neuroscience Program
 Research Scientist             333 Ravenswood Ave, Menlo Park, CA 94025
  Phone: ++1 (650) 859-3379      Fax: ++1 (650) 859-2743
   torsten at synapse.sri.com        http://www.stanford.edu/~rohlfing/

     "Though this be madness, yet there is a method in't"

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.vcf
Type: text/x-vcard
Size: 366 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081025/3a5be4a2/torsten.vcf


More information about the torqueusers mailing list