[torqueusers] mixed 32/64 bit cluster
Torsten Rohlfing
torsten at synapse.sri.com
Sat Oct 25 12:51:53 MDT 2008
> i have compiled the 32 bit mom on the 64 bit arch. everything seemed to
> complete properly. however, now the pbs_mom segfaults with an error 7 when
> i try to start it up. also, i am running a 32 bit server on the head.
> will this cause communication problems with the 64 bit moms on the other
> nodes? i read that they could not communicate with each other properly.
>
>
Out of curiosity -- where did you read about communication problems
between 32bit server and 64bit mom? And what version of torque are you
using? Are you sure that your setup would work on a pure 32bit or pure
64bit cluster? Maybe your mom crashing is unrelated to 32/64 bit issues.
Our setup is obviously opposite to yours, 64bit server with some 32bit
clients, but at least that works fine (our installation uses
torque-2.1.10-5.fc9).
> the 64 bit compilation of moms are up and running on the other nodes but no
> jobs are pushed out to them. i am assuming from your response that this is
> because of some library dependencies? does torque check job dependencies or
> is the job pushed out and then rejected when the job itself determines that
> it does not have the required libs?
When your job script invokes a binary with missing libraries, the same
thing happens that occurs when you run such binaries from the command
line: you get an error message about missing libs and the binary
terminates. You should see that in the job's stderr log file. In the
alternative, you might just want to run a job directly via command line
on the compute nodes. If it doesn't run directly from the shell, chances
are it won't run through torque either.
If it runs from shell bot not torque, see if your interactive shell sets
LD_LIBRARY_PATH to search for shared libraries in non-standard places.
You should then make sure the same is set in your job when you submit it
to torque (e.g., put an "export LD_LIBRARY_PATH=path:path:path...." at
the beginning of your job script).
> if that is indeed the case, how does
> one find the library requirements for jobs that i do not compile or submit?
>
--
Torsten Rohlfing, PhD SRI International, Neuroscience Program
Research Scientist 333 Ravenswood Ave, Menlo Park, CA 94025
Phone: ++1 (650) 859-3379 Fax: ++1 (650) 859-2743
torsten at synapse.sri.com http://www.stanford.edu/~rohlfing/
"Though this be madness, yet there is a method in't"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.vcf
Type: text/x-vcard
Size: 366 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081025/3a5be4a2/torsten.vcf
More information about the torqueusers
mailing list