[torqueusers] trouble getting started -- jobs stuck in queue
Jeff Anderson-Lee
jonah at eecs.berkeley.edu
Tue Feb 9 11:24:12 MST 2010
I'm trying to set up torque on a new cluster of Nehalem/Ubuntu 9.10
nodes. I had no success with the pre-packaged torque packages, so I
installed from source (torque-2.4.4.tar.gz).
On the head node I did a ./configure; make; make install; make
packages. Then ldconfig; ./torque.setup root; qterm -t quick;
pbs_server -t create. I created a new queue using a sequence of qmgr
command.
On the compute nodes I ran the following:
TORQUE_CFG=/var/spool/torque/
TORQUE_HEAD=s151
export TORQUE_CFG TORQUE_HEAD
./torque-package-mom-linux-x86_64.sh --install
./torque-package-clients-linux-x86_64.sh --install
echo '$pbsserver '$TORQUE_HEAD >$TORQUE_CFG/mom_priv/config
echo '$logevent 255' >>$TORQUE_CFG/mom_priv/config
ldconfig
pbs_mom
I created two simple test jobs as follows:
echo sleep 30 | qsub
echo echo foo | qsub
When I run qstat I see the two jobs waiting in the queue. When I run
"pbsnodes -l free" I see four free compute nodes. But nothing seems to run.
I'm sure it's something simple. For instance, on which nodes am I
supposed to run pbs_mom and pbs_sched? The head node? The compute
nodes? Both??
Thanks.
More information about the torqueusers
mailing list