[torqueusers] Scheduler bound to ETHO IP port

Christina Salls christina.salls at noaa.gov
Thu Mar 8 12:31:00 MST 2012


Thanks Rick!  Sorry for the delayed response.  I am just returning from
vacation and catching up with email!  This looks like what I need.  I do
not have a torque.cfg file in my /var/spool/torque directory, but I assume
I can just create it.

On Fri, Feb 17, 2012 at 5:54 PM, Rick McKay <rmckay at adaptivecomputing.com>wrote:

> Christina,
>
> I think you're looking for this:
>
> From 2.5.9 CHANGELOG file:
>   e - Added new option to torque.cfg name TRQ_IFNAME. This allows the user
> to designate a preferred outbound interface for TORQUE requests. The
> interface is the name of the NIC interface, for example eth0.
>
> Reference that parameter and QSUBHOST in Appendix K.
>
> --Rick
>
> Rick McKay | Technical Support Engineer
> rmckay at adaptivecomputing.com
> Direct: (801) 717-3395 | Toll free: 1-888-221-2008 x3395
> Adaptive Computing | www.adaptivecomputing.com
>
> ------------------------------
> *From: *"Christina Salls" <christina.salls at noaa.gov>
> *To: *"Torque Users Mailing List" <torqueusers at supercluster.org>,
> "Michael Saxon" <saxonm at sgi.com>, "Frank Indiviglio" <
> frank.indiviglio at noaa.gov>, "Craig Tierney" <craig.tierney at noaa.gov>,
> "help >> GLERL IT Help" <oar.glerl.it-help at noaa.gov>, "Jeff Hanson" <
> jhanson at sgi.com>, "Brian Beagan" <beagan at sgi.com>, "John Cardenas" <
> cardenas at sgi.com>
> *Sent: *Friday, February 17, 2012 2:07:47 PM
>
> *Subject: *[torqueusers] Scheduler bound to ETHO IP port
>
> Hi all,
>
>        I have been experiencing a problem with jobs staying in my default
> queue until I force execution with a qrun.  It turns out that the reason is
> that my torque server is configured on my second ethernet interface which
> is connected to my compute nodes.  The problem is that the scheduler is
> bound to the 1st interface port.
>
> [root at wings server_logs]# ps -ef | grep pbs
> root      1268     1  0 13:56 ?        00:00:00 /usr/local/sbin/pbs_server
> -d /var/spool/torque -H admin.default.domain
> root     14768     1  0 14:25 ?        00:00:00 /usr/local/sbin/pbs_sched
> -d /var/spool/torque
> root     21956 16623  0 14:41 pts/25   00:00:00 grep pbs
> [root at wings server_logs]# lsof -p 14768
> COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF    NODE NAME
> pbs_sched 14768 root  cwd    DIR      8,98     4096 6032970
> /var/spool/torque/sched_priv
> pbs_sched 14768 root  rtd    DIR      8,98     4096       2 /
> pbs_sched 14768 root  txt    REG      8,98   268782 3421344
> /usr/local/sbin/pbs_sched
> pbs_sched 14768 root  mem    REG      8,98   156872 3276802 /lib64/
> ld-2.12.so
> pbs_sched 14768 root  mem    REG      8,98  1979000 3276803 /lib64/
> libc-2.12.so
> pbs_sched 14768 root  mem    REG      8,98    65928 3277205 /lib64/
> libnss_files-2.12.so
> pbs_sched 14768 root  mem    REG      8,98   791107 3418524
> /usr/local/lib/libtorque.so.2.0.0
> pbs_sched 14768 root    0r   CHR       1,3      0t0    3772 /dev/null
> pbs_sched 14768 root    1w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out
> pbs_sched 14768 root    2w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out
> pbs_sched 14768 root    3w   REG      8,98     2699 6033359
> /var/spool/torque/sched_logs/20120217
> pbs_sched 14768 root    4u  IPv4 801882953      0t0     TCP
> wings.glerl.noaa.gov:15004 (LISTEN)
> pbs_sched 14768 root    5wW  REG      8,98        7 6033329
> /var/spool/torque/sched_priv/sched.lock
> pbs_sched 14768 root    6r   REG      8,98     4374 6032952
> /var/spool/torque/sched_priv/resource_group
> pbs_sched 14768 root    7w   REG      8,98        0 6033360
> /var/spool/torque/sched_priv/accounting/20120217
> [root at wings server_logs]# cd ..
> [root at wings torque]# ls
> aux  checkpoint  job_logs  mom_logs  mom_priv  pbs_environment  sched_logs
>  sched_priv  server_logs  server_name  server_priv  spool  undelivered
> [root at wings torque]# lsof -n -p 14768
> COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF    NODE NAME
> pbs_sched 14768 root  cwd    DIR      8,98     4096 6032970
> /var/spool/torque/sched_priv
> pbs_sched 14768 root  rtd    DIR      8,98     4096       2 /
> pbs_sched 14768 root  txt    REG      8,98   268782 3421344
> /usr/local/sbin/pbs_sched
> pbs_sched 14768 root  mem    REG      8,98   156872 3276802 /lib64/
> ld-2.12.so
> pbs_sched 14768 root  mem    REG      8,98  1979000 3276803 /lib64/
> libc-2.12.so
> pbs_sched 14768 root  mem    REG      8,98    65928 3277205 /lib64/
> libnss_files-2.12.so
> pbs_sched 14768 root  mem    REG      8,98   791107 3418524
> /usr/local/lib/libtorque.so.2.0.0
> pbs_sched 14768 root    0r   CHR       1,3      0t0    3772 /dev/null
> pbs_sched 14768 root    1w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out
> pbs_sched 14768 root    2w   REG      8,98        0 6033331
> /var/spool/torque/sched_priv/sched_out
> pbs_sched 14768 root    3w   REG      8,98     2699 6033359
> /var/spool/torque/sched_logs/20120217
> pbs_sched 14768 root    4u  IPv4 801882953      0t0     TCP
> 192.94.173.9:15004 (LISTEN)
> pbs_sched 14768 root    5wW  REG      8,98        7 6033329
> /var/spool/torque/sched_priv/sched.lock
> pbs_sched 14768 root    6r   REG      8,98     4374 6032952
> /var/spool/torque/sched_priv/resource_group
> pbs_sched 14768 root    7w   REG      8,98        0 6033360
> /var/spool/torque/sched_priv/accounting/20120217
> [root at wings torque]# ls
> aux  checkpoint  job_logs  mom_logs  mom_priv  pbs_environment  sched_logs
>  sched_priv  server_logs  server_name  server_priv  spool  undelivered
> [root at wings torque]# cd sched_priv
> [root at wings sched_priv]# ls
> accounting  dedicated_time  holidays  resource_group  sched_config
>  sched.lock  sched_out
> [root at wings sched_priv]# more sched_config
>
> When I used hostname to change the name to the admin.default.domain, and
> restarted the pbs_sched daemon, everything started working.
>
> Any idea how to change the hostname/IP/interface that the scheduler uses?
>
> Thanks,
>
>      Christina
>
> --
> Christina A. Salls
> GLERL Computer Group
> help.glerl at noaa.gov
> Help Desk x2127
> Christina.Salls at noaa.gov
> Voice Mail 734-741-2446
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> ****
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
Christina A. Salls
GLERL Computer Group
help.glerl at noaa.gov
Help Desk x2127
Christina.Salls at noaa.gov
Voice Mail 734-741-2446
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120308/0911268d/attachment.html 


More information about the torqueusers mailing list