[torqueusers] Re: Basic Torque configuration reports cluster is down

Yang Wang yang.wang at agencourt.com
Fri Feb 8 08:10:07 MST 2008


I have some similar problem before and it was associated an DNS reverse look up. Try to use nslookup for host name and then for IP address to see if you can get the hostname. Or just try to modify the /etc/hosts to including all nodes' "IPAddress hostNameWithDomain hostname" on every cluster node. 

Good luck.

Yang

-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Adil Mughal
Sent: Friday, February 08, 2008 9:28 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] Re: Basic Torque configuration reports cluster is down

Here is some further information

I looked at the file  mom_logs on one of the computers (dphpc1001) and
here is what it says:

02/08/2008 11:29:55;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 11:29:55;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:28:26;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:28:26;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:29:48;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:29:48;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:42:44;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:42:44;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:44:14;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:44:14;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 12:52:12;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 12:52:12;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 13:39:36;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 13:39:36;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1
02/08/2008 13:50:10;0002;   pbs_mom;Svr;Log;Log opened
02/08/2008 13:50:10;0001;   pbs_mom;Svr;pbs_mom;pbs_mom, Unable to get
my full hostname for dphpc1001 error -1

looks like there is an error in specifying the name of the server.

This is how my /var/spool/torque/mom_priv/config file looks on dphpc1001

$pbsserver dphpc1011.dph.aber.ac.uk
$usecp *:/home /home
$logevent       255

and here is my /var/spool/torque/server_name

dphpc1011.dph.aber.ac.uk

both

ping dphpc1011.dph.aber.ac.uk from dphpc1001.

and

ping dphpc1001.dph.aber.ac.uk from dphpc1011

are successful


adil







On Feb 8, 2008 12:13 PM, Adil Mughal <adil.m.mughal at gmail.com> wrote:
> Dear Experts
>
> I have another question about setting up Torque - which follows below
> - as always my deep thanks to anyone who can take the time to help me
> with this.
>
> I have managed to install Torque on server computer and on two other
> computers by following the quick start instructions given at:
>
> http://www.clusterresources.com/wiki/doku.php?id=torque:appendix:l_torque_quickstart_guide
>
> I then perform the steps to verify correct installation. Upon typing
>
> >pbsnodes -a
>
> I get:
>
> dphpc1001.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> dphpc1002.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> dphpc1003.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> dphpc1004.dph.aber.ac.uk
>      state = down
>      np = 2
>      ntype = cluster
>
> and so on
>
>
> NOTE: I have only set up Torque on dphpc1001 and dphpc1002 at the moment.
>
> For these two (i.e. 1001 and 1002) I should expect to see state = free
> but I don't - can any one tell me what I might have done wrong? Also
> submitting jobs to the queue
>
> >echo "sleep 30" | qsub
>
> simply results in the jobs pilling up in the queue
>
>
> adil
>
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list