[torqueusers] PBS not starting

Vadivelan Ranjith achillesvelan at yahoo.co.in
Thu Jan 18 10:48:26 MST 2007


Hi Friends
We used PBS and upgrade to torque-2.1.0p0.  Jobs and queue everything was fine. Two days before i stop the queue and shutdown the machine for renovation. Today i booted frondend and all compute nodes. I was in shock by seeing the error msg. I started the pbs in frondend. It gave the following msg.
----------------------------------------------------------------------------------------------------------------------
root at galaxy:/usr/local# /sbin/service pbs restart
Restarting PBS
Stopping PBS
Starting PBS
PBS_Server: Resource temporarily unavailable (11) in PBS_Server, pbs_server: another server running

pbs_server: another server running
PBS server
cannot change directory to home '/usr/spool/PBS/mom_priv': No such file or directory
PBS mom
pbs_sched: Address already in use (98) in main, bind
PBS sched
root at galaxy:/usr/local#
----------------------------------------------------------------------------------------------------------------------


And i not able to start maui also. It gave the following error.
----------------------------------------------------------------------------------------------------------------------
root at galaxy:/usr/local# /usr/local/maui-3.2.6p16/sbin/maui start
ERROR:    cannot open user interface socket on port 42559
----------------------------------------------------------------------------------------------------------------------



I submitted the jobs(I forcely ran it). Jobs with one processor is running fine. If i give two processor it gave the following error

----------------------------------------------------------------------------------------------------------------------

mpdboot_node02.cluster2.iitb.ac.in (handle_mpd_output 359): failed to ping mpd on node01; recvd output={}

mpiexec_node02.cluster2.iitb.ac.in: cannot connect to local mpd (/tmp/mpd2.console_velan); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_velan); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
----------------------------------------------------------------------------------------------------------------------

my job file having the following details
----------------------------------------------------------------------------------------------------------------------
#!/bin/bash

#PBS -l nodes=2:ppn=1

cd $HOME/2DSIM/1proc

n=`/usr/local/bin/pbs2mpich2hosts.py $PBS_NODEFILE hosts`

/usr/local/bin/mpdboot -n $n -f hosts -r rsh --mpd=/usr/local/bin/mpd
/usr/local/bin/mpiexec -n 1 /home/aero/velan/2DSIM/1proc/pg170x91.exe
/usr/local/bin/mpdallexit
rm -f hosts
----------------------------------------------------------------------------------------------------------------------

can anybody please help me. Actually i not configured this machine and i am new to this. I thankyou verymuch for your kind help

Regards
Velan


 				
---------------------------------
 Here’s a new way to find what you're looking for - Yahoo! Answers 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070118/5dbd8e60/attachment.html


More information about the torqueusers mailing list