[torqueusers] Scheduler bound to ETHO IP port

Christina Salls christina.salls at noaa.gov
Fri Feb 17 14:07:47 MST 2012


Hi all,

       I have been experiencing a problem with jobs staying in my default
queue until I force execution with a qrun.  It turns out that the reason is
that my torque server is configured on my second ethernet interface which
is connected to my compute nodes.  The problem is that the scheduler is
bound to the 1st interface port.

[root at wings server_logs]# ps -ef | grep pbs
root      1268     1  0 13:56 ?        00:00:00 /usr/local/sbin/pbs_server
-d /var/spool/torque -H admin.default.domain
root     14768     1  0 14:25 ?        00:00:00 /usr/local/sbin/pbs_sched
-d /var/spool/torque
root     21956 16623  0 14:41 pts/25   00:00:00 grep pbs
[root at wings server_logs]# lsof -p 14768
COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF    NODE NAME
pbs_sched 14768 root  cwd    DIR      8,98     4096 6032970
/var/spool/torque/sched_priv
pbs_sched 14768 root  rtd    DIR      8,98     4096       2 /
pbs_sched 14768 root  txt    REG      8,98   268782 3421344
/usr/local/sbin/pbs_sched
pbs_sched 14768 root  mem    REG      8,98   156872 3276802 /lib64/
ld-2.12.so
pbs_sched 14768 root  mem    REG      8,98  1979000 3276803 /lib64/
libc-2.12.so
pbs_sched 14768 root  mem    REG      8,98    65928 3277205 /lib64/
libnss_files-2.12.so
pbs_sched 14768 root  mem    REG      8,98   791107 3418524
/usr/local/lib/libtorque.so.2.0.0
pbs_sched 14768 root    0r   CHR       1,3      0t0    3772 /dev/null
pbs_sched 14768 root    1w   REG      8,98        0 6033331
/var/spool/torque/sched_priv/sched_out
pbs_sched 14768 root    2w   REG      8,98        0 6033331
/var/spool/torque/sched_priv/sched_out
pbs_sched 14768 root    3w   REG      8,98     2699 6033359
/var/spool/torque/sched_logs/20120217
pbs_sched 14768 root    4u  IPv4 801882953      0t0     TCP
wings.glerl.noaa.gov:15004 (LISTEN)
pbs_sched 14768 root    5wW  REG      8,98        7 6033329
/var/spool/torque/sched_priv/sched.lock
pbs_sched 14768 root    6r   REG      8,98     4374 6032952
/var/spool/torque/sched_priv/resource_group
pbs_sched 14768 root    7w   REG      8,98        0 6033360
/var/spool/torque/sched_priv/accounting/20120217
[root at wings server_logs]# cd ..
[root at wings torque]# ls
aux  checkpoint  job_logs  mom_logs  mom_priv  pbs_environment  sched_logs
 sched_priv  server_logs  server_name  server_priv  spool  undelivered
[root at wings torque]# lsof -n -p 14768
COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF    NODE NAME
pbs_sched 14768 root  cwd    DIR      8,98     4096 6032970
/var/spool/torque/sched_priv
pbs_sched 14768 root  rtd    DIR      8,98     4096       2 /
pbs_sched 14768 root  txt    REG      8,98   268782 3421344
/usr/local/sbin/pbs_sched
pbs_sched 14768 root  mem    REG      8,98   156872 3276802 /lib64/
ld-2.12.so
pbs_sched 14768 root  mem    REG      8,98  1979000 3276803 /lib64/
libc-2.12.so
pbs_sched 14768 root  mem    REG      8,98    65928 3277205 /lib64/
libnss_files-2.12.so
pbs_sched 14768 root  mem    REG      8,98   791107 3418524
/usr/local/lib/libtorque.so.2.0.0
pbs_sched 14768 root    0r   CHR       1,3      0t0    3772 /dev/null
pbs_sched 14768 root    1w   REG      8,98        0 6033331
/var/spool/torque/sched_priv/sched_out
pbs_sched 14768 root    2w   REG      8,98        0 6033331
/var/spool/torque/sched_priv/sched_out
pbs_sched 14768 root    3w   REG      8,98     2699 6033359
/var/spool/torque/sched_logs/20120217
pbs_sched 14768 root    4u  IPv4 801882953      0t0     TCP
192.94.173.9:15004 (LISTEN)
pbs_sched 14768 root    5wW  REG      8,98        7 6033329
/var/spool/torque/sched_priv/sched.lock
pbs_sched 14768 root    6r   REG      8,98     4374 6032952
/var/spool/torque/sched_priv/resource_group
pbs_sched 14768 root    7w   REG      8,98        0 6033360
/var/spool/torque/sched_priv/accounting/20120217
[root at wings torque]# ls
aux  checkpoint  job_logs  mom_logs  mom_priv  pbs_environment  sched_logs
 sched_priv  server_logs  server_name  server_priv  spool  undelivered
[root at wings torque]# cd sched_priv
[root at wings sched_priv]# ls
accounting  dedicated_time  holidays  resource_group  sched_config
 sched.lock  sched_out
[root at wings sched_priv]# more sched_config

When I used hostname to change the name to the admin.default.domain, and
restarted the pbs_sched daemon, everything started working.

Any idea how to change the hostname/IP/interface that the scheduler uses?

Thanks,

     Christina

-- 
Christina A. Salls
GLERL Computer Group
help.glerl at noaa.gov
Help Desk x2127
Christina.Salls at noaa.gov
Voice Mail 734-741-2446
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120217/ec817a79/attachment-0001.html 


More information about the torqueusers mailing list