[torqueusers] Re: Basic Torque configuration reports cluster is down

Adil Mughal adil.m.mughal at gmail.com
Fri Feb 8 07:28:10 MST 2008


Here is some further information

I looked at the file  mom_logs on one of the computers (dphpc1001) and
here is what it says:

02/08/2008 11:29:55;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 11:29:55;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:28:26;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:28:26;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:29:48;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:29:48;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:42:44;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:42:44;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:44:14;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:44:14;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:52:12;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:52:12;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 13:39:36;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 13:39:36;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 13:50:10;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 13:50:10;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1

looks like there is an error in specifying the name of the server.

This is how my /var/spool/torque/mom_priv/config file looks on dphpc1001

$pbsserver dphpc1011.dph.aber.ac.uk
$usecp *:/home /home
$logevent       255

and here is my /var/spool/torque/server_name

dphpc1011.dph.aber.ac.uk

both

ping dphpc1011.dph.aber.ac.uk from dphpc1001.

and

ping dphpc1001.dph.aber.ac.uk from dphpc1011

are successful


adil







On Feb 8, 2008 12:13 PM, Adil Mughal <adil.m.mughal at gmail.com> wrote:
> Dear Experts
>
> I have another question about setting up Torque - which follows below
> - as always my deep thanks to anyone who can take the time to help me
> with this.
>
> I have managed to install Torque on server computer and on two other
> computers by following the quick start instructions given at:
>
> http://www.clusterresources.com/wiki/doku.php?id=torque:appendix:l_torque_quickstart_guide
>
> I then perform the steps to verify correct installation. Upon typing
>
> >pbsnodes -a
>
> I get:
>
> dphpc1001.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> dphpc1002.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> dphpc1003.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> dphpc1004.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> and so on
>
>
> NOTE: I have only set up Torque on dphpc1001 and dphpc1002 at the moment.
>
> For these two (i.e. 1001 and 1002) I should expect to see state = free
> but I don't - can any one tell me what I might have done wrong? Also
> submitting jobs to the queue
>
> >echo "sleep 30" | qsub
>
> simply results in the jobs pilling up in the queue
>
>
> adil
>


More information about the torqueusers mailing list