[torqueusers] Errors setting up torque
Carolyn Sawyer
csawyer at berkeley.edu
Fri Nov 30 11:55:02 MST 2012
Hi,
I am trying to set up Torque on a Fedora 15 machine (one node, 48 cores)
for my research group. I installed it using "yum torque". The version
installed appears to be 3.0.3. I've tried two methods: following the
instructions in README.Fedora, and following online instructions; any
time I try to run a qmgr instruction, I get a "qmgr: cannot connect to
server (errno=110) Connection timed out" error.
README.Fedora method: Following the instructions, I created a munge key
with "/usr/sbin/create-munge-key". My hostname, using "/bin/hostname
--long", is slacker.berkeley.edu. I edited /etc/torque/server_name to
have "slacker.berkeley.edu" as its full contents. I edited
/etc/torque/mom/config to have "$pbsserver slacker.berkeley.edu" as its
only contents. I ran "/usr/sbin/pbs_server -D -t create", hit "y" to
continue, received the "pbs_server is up" message, and hit Ctrl-C (per
instructions). I did "service pbs_server start" and got a message
"Starting pbs_server (vis systemctl): [OK]". I then try "qmgr -c "s s
scheduling=true"" and it sits for a while and then spits out "Connection
time out/qmgr: cannot connect to server (errno=110) Connection timed
out". Any qmgr command does the same thing, and if I skip ahead to
"service pbs_sched start" it fails.
I also tried following online instructions and running torque.setup,
which is in /usr/share/doc/torque-3.0.3. This demands a username, so I
tried "./torque.setup root" since I am running as root. It responds with
"PBS_Server slacker.berkeley.edu: Create mode and server database
exists, do you wish to continue y/(n)?" so I hit y. It sits for a while
and then spits out "Connection timed out/qmgr: cannot connect to server
(errno=110) Connection timed out/ERROR: cannot set TORQUE admins", then
sits a while longer, gives another timeout error, and drops me back to
terminal. At this point ps shows that "pbs_server -t create" is running,
but it never finishes.
Any thoughts? The fixes I saw mentioned in the list archive for this
error message all seemed to require qmgr commands, which don't work for
me...
Thanks,
Carolyn Sawyer
More information about the torqueusers
mailing list