[torqueusers] Scheduler bound to ETHO IP port

Christina Salls christina.salls at noaa.gov
Thu Mar 8 12:31:49 MST 2012


Thanks James!

On Mon, Feb 20, 2012 at 11:43 AM, Coyle, James J [ITACD] <jjc at iastate.edu>wrote:

>  Cristina,****
>
> ** **
>
>   I think that it is common to use two interfaces on the login node, one
> inward facing on a private subnet and ****
>
> one outward facing, and place the internal interface name in
> /var/spool/torque/server_name .****
>
>   Make sure that****
>
> ** **
>
>    What I always do is to use /etc/hosts and insert a line like:****
>
> ** **
>
> 172.16.10.1    loginnode   admin   admin.default.domain****
>
> ** **
>
> and copy /etc/host through the compute nodes.****
>
> ** **
>
>   You will also want to make sure that****
>
>    files ****
>
> precedes ****
>
> dns****
>
>    in /etc/nsswitch.conf****
>
> ** **
>
>    Then I can use the internal name.****
>
> ** **
>
> **-          **Jim C.****
>
> ** **
>
> ** **
>
> *From:* torqueusers-bounces at supercluster.org [mailto:
> torqueusers-bounces at supercluster.org] *On Behalf Of *Christina Salls
> *Sent:* Friday, February 17, 2012 3:08 PM
> *To:* Torque Users Mailing List; Michael Saxon; Frank Indiviglio; Craig
> Tierney; help >> GLERL IT Help; Jeff Hanson; Brian Beagan; John Cardenas
> *Subject:* [torqueusers] Scheduler bound to ETHO IP port****
>
> ** **
>
> Hi all,****
>
> ** **
>
>        I have been experiencing a problem with jobs staying in my default
> queue until I force execution with a qrun.  It turns out that the reason is
> that my torque server is configured on my second ethernet interface which
> is connected to my compute nodes.  The problem is that the scheduler is
> bound to the 1st interface port.  ****
>
> ** **
>
> [root at wings server_logs]# ps -ef | grep pbs****
>
> root      1268     1  0 13:56 ?        00:00:00 /usr/local/sbin/pbs_server
> -d /var/spool/torque -H admin.default.domain****
>
> root     14768     1  0 14:25 ?        00:00:00 /usr/local/sbin/pbs_sched
> -d /var/spool/torque****
>
> root     21956 16623  0 14:41 pts/25   00:00:00 grep pbs****
>
> [root at wings server_logs]# lsof -p 14768****
>
> COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF    NODE NAME****
>
> pbs_sched 14768 root  cwd    DIR      8,98     4096 6032970
> /var/spool/torque/sched_priv****
>
> pbs_sched 14768 root  rtd    DIR      8,98     4096       2 /****
>
> pbs_sched 14768 root  txt    REG      8,98   268782 3421344
> /usr/local/sbin/pbs_sched****
>
> pbs_sched 14768 root  mem    REG      8,98   156872 3276802 /lib64/
> ld-2.12.so****
>
> pbs_sched 14768 root  mem    REG      8,98  1979000 3276803 /lib64/
> libc-2.12.so****
>
> pbs_sched 14768 root  mem    REG      8,98    65928 3277205 /lib64/
> libnss_files-2.12.so****
>
> pbs_sched 14768 root  mem    REG      8,98   791107 3418524
> /usr/local/lib/libtorque.so.2.0.0****
>
> pbs_sched 14768 root    0r   CHR       1,3      0t0    3772 /dev/null****
>
> pbs_sched 14768 root    1w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out****
>
> pbs_sched 14768 root    2w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out****
>
> pbs_sched 14768 root    3w   REG      8,98     2699 6033359
> /var/spool/torque/sched_logs/20120217****
>
> pbs_sched 14768 root    4u  IPv4 801882953      0t0     TCP
> wings.glerl.noaa.gov:15004 (LISTEN)****
>
> pbs_sched 14768 root    5wW  REG      8,98        7 6033329
> /var/spool/torque/sched_priv/sched.lock****
>
> pbs_sched 14768 root    6r   REG      8,98     4374 6032952
> /var/spool/torque/sched_priv/resource_group****
>
> pbs_sched 14768 root    7w   REG      8,98        0 6033360
> /var/spool/torque/sched_priv/accounting/20120217****
>
> [root at wings server_logs]# cd ..****
>
> [root at wings torque]# ls****
>
> aux  checkpoint  job_logs  mom_logs  mom_priv  pbs_environment  sched_logs
>  sched_priv  server_logs  server_name  server_priv  spool  undelivered****
>
> [root at wings torque]# lsof -n -p 14768****
>
> COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF    NODE NAME****
>
> pbs_sched 14768 root  cwd    DIR      8,98     4096 6032970
> /var/spool/torque/sched_priv****
>
> pbs_sched 14768 root  rtd    DIR      8,98     4096       2 /****
>
> pbs_sched 14768 root  txt    REG      8,98   268782 3421344
> /usr/local/sbin/pbs_sched****
>
> pbs_sched 14768 root  mem    REG      8,98   156872 3276802 /lib64/
> ld-2.12.so****
>
> pbs_sched 14768 root  mem    REG      8,98  1979000 3276803 /lib64/
> libc-2.12.so****
>
> pbs_sched 14768 root  mem    REG      8,98    65928 3277205 /lib64/
> libnss_files-2.12.so****
>
> pbs_sched 14768 root  mem    REG      8,98   791107 3418524
> /usr/local/lib/libtorque.so.2.0.0****
>
> pbs_sched 14768 root    0r   CHR       1,3      0t0    3772 /dev/null****
>
> pbs_sched 14768 root    1w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out****
>
> pbs_sched 14768 root    2w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out****
>
> pbs_sched 14768 root    3w   REG      8,98     2699 6033359
> /var/spool/torque/sched_logs/20120217****
>
> pbs_sched 14768 root    4u  IPv4 801882953      0t0     TCP
> 192.94.173.9:15004 (LISTEN)****
>
> pbs_sched 14768 root    5wW  REG      8,98        7 6033329
> /var/spool/torque/sched_priv/sched.lock****
>
> pbs_sched 14768 root    6r   REG      8,98     4374 6032952
> /var/spool/torque/sched_priv/resource_group****
>
> pbs_sched 14768 root    7w   REG      8,98        0 6033360
> /var/spool/torque/sched_priv/accounting/20120217****
>
> [root at wings torque]# ls****
>
> aux  checkpoint  job_logs  mom_logs  mom_priv  pbs_environment  sched_logs
>  sched_priv  server_logs  server_name  server_priv  spool  undelivered****
>
> [root at wings torque]# cd sched_priv****
>
> [root at wings sched_priv]# ls****
>
> accounting  dedicated_time  holidays  resource_group  sched_config
>  sched.lock  sched_out****
>
> [root at wings sched_priv]# more sched_config****
>
> ** **
>
> When I used hostname to change the name to the admin.default.domain, and
> restarted the pbs_sched daemon, everything started working.  ****
>
> ** **
>
> Any idea how to change the hostname/IP/interface that the scheduler uses?*
> ***
>
> ** **
>
> Thanks,****
>
> ** **
>
>      Christina****
>
> ** **
>
> --
> Christina A. Salls****
>
> GLERL Computer Group****
>
> help.glerl at noaa.gov****
>
> Help Desk x2127****
>
> Christina.Salls at noaa.gov****
>
> Voice Mail 734-741-2446 ****
>
> ** **
>
> ** **
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
Christina A. Salls
GLERL Computer Group
help.glerl at noaa.gov
Help Desk x2127
Christina.Salls at noaa.gov
Voice Mail 734-741-2446
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120308/f75e7d07/attachment-0001.html 


More information about the torqueusers mailing list