[torqueusers] trouble getting started -- jobs stuck in queue

Jeff Anderson-Lee jonah at eecs.berkeley.edu
Tue Feb 9 11:24:12 MST 2010


I'm trying to set up torque on a new cluster of Nehalem/Ubuntu 9.10 
nodes.  I had no success with the pre-packaged torque packages, so I 
installed from source (torque-2.4.4.tar.gz). 

On the head node I did a ./configure; make; make install; make 
packages.  Then ldconfig; ./torque.setup root; qterm -t quick; 
pbs_server -t create.  I created a new queue using a sequence of qmgr 
command.

On the compute nodes I ran the following:
  TORQUE_CFG=/var/spool/torque/
  TORQUE_HEAD=s151
  export TORQUE_CFG TORQUE_HEAD
  ./torque-package-mom-linux-x86_64.sh --install
  ./torque-package-clients-linux-x86_64.sh --install
  echo '$pbsserver      '$TORQUE_HEAD >$TORQUE_CFG/mom_priv/config
  echo '$logevent       255' >>$TORQUE_CFG/mom_priv/config
  ldconfig
  pbs_mom

I created two simple test jobs as follows:
   echo sleep 30 | qsub
   echo echo foo | qsub

When I run qstat I see the two jobs waiting in the queue.  When I run 
"pbsnodes -l free" I see four free compute nodes.  But nothing seems to run.

I'm sure it's something simple.  For instance, on which nodes am I 
supposed to run pbs_mom and pbs_sched?  The head node?  The compute 
nodes?  Both??

Thanks.




More information about the torqueusers mailing list