[torqueusers] simple (I hope) /etc/hosts question

Nathan Moore ntmoore at gmail.com
Wed Jan 3 10:17:25 MST 2007


Ok, I think the issue is resolved.  I needed to fill out the ./ 
mom_priv/config file with the following contents:

[root at runner torque]# cat ./mom_priv/config
$pbsserver runner
$clienthost muscovey
$clienthost pekin

To be safe, (is this necessary?) I made copies of the file on every  
node (server and moms).

Also, if I want to be able to run jobs on runner (the server), should  
I the mom_priv file also have a line
	$clienthost runner
?

Thanks for such a fast response!

Nathan


- - - - - - - - - - - - - - - - - - - - - - -

Nathan Moore
Physics
Winona State University
nmoore at winona.edu
AIM:nmoorewsu

- - - - - - - - - - - - - - - - - - - - - - -


On Jan 3, 2007, at 10:34 AM, Glen Beane wrote:

what does your pbs_mom config file look like  (probably
/var/spool/torque/mom_priv/config)?

If you don't have one, create one with the following (I think, this is
off the top of my head)

$pbsserver runner  (or whatever server is running pbs_server)
$clienthost muscovey
$clienthost pekin


and see if that helps

On 1/2/07, Nathan Moore <ntmoore at gmail.com> wrote:
> Torque was really easy to install, but it seems like my /etc/hosts  
> file must
> be screwed up, as I can't get the cluster nodes to respond.   
> Specifically,
> within a cluster of 3 machines, each having an /etc/hosts file of:
>
>     127.0.0.1       localhost.localdomain   localhost
>     199.17.152.17   runner
>     199.17.152.135  muscovey
>     199.17.152.13   pekin
>     (( other workstations follow ))
>
> Now, when I have the pbs_server running on runner, and the pbs_mom  
> daemons
> running on muscovey, pekin, and runner, I et the following status  
> message,
>
>     [root at runner torque-2.1.6]# pbsnodes -a
>     pekin
>          state = down
>          np = 1
>          ntype = cluster
>
>     muscovey
>          state = down
>          np = 1
>          ntype = cluster
>
>     runner
>          state = down
>          np = 1
>          ntype = cluster
>
> I realize this is a pretty low-level question, but what the heck is  
> wrong
> with my /etc/hosts file?
>
> regards,
>
>  NT
>
>
> ps,  the trouble shooting message given by torque is,
>
>     [root at runner torque-2.1.6]# momctl -d 3
>
>     Host: runner/runner   Version: 2.1.6
>     WARNING:  server not specified (set $pbsserver)
>     PID:                    30531
>     HomeDirectory:          /var/spool/torque/mom_priv
>     MOM active:             2518 seconds
>     Server Update Interval: 45 seconds
>     LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
>     Communication Model:    RPP
>     TCP Timeout:            20 seconds
>     NOTE:  no prolog configured
>     Alarm Time:             0 of 10 seconds
>     Trusted Client List:    199.17.152.17,127.0.0.1
>     Configured to use /usr/bin/scp -rpB
>     NOTE:  no local jobs detected
>
>     diagnostics complete
>
>
> - - - - - - -   - - - - - - -   - - - - - - -
> Nathan Moore
> Assistant Professor, Physics
> Winona State University
> AIM: nmoorewsu
> - - - - - - -   - - - - - - -   - - - - - - -
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>



More information about the torqueusers mailing list