[torqueusers] Scheduler bound to ETHO IP port

Rick McKay rmckay at adaptivecomputing.com
Fri Feb 17 15:54:01 MST 2012


Christina, 


I think you're looking for this: 


>From 2.5.9 CHANGELOG file: 
e - Added new option to torque.cfg name TRQ_IFNAME. This allows the user to designate a preferred outbound interface for TORQUE requests. The interface is the name of the NIC interface, for example eth0. 


Reference that parameter and QSUBHOST in Appendix K. 


--Rick 


Rick McKay | Technical Support Engineer 
rmckay at adaptivecomputing.com 
Direct: (801) 717-3395 | Toll free: 1-888-221-2008 x3395 
Adaptive Computing | www.adaptivecomputing.com 

----- Original Message -----

From: "Christina Salls" <christina.salls at noaa.gov> 
To: "Torque Users Mailing List" <torqueusers at supercluster.org>, "Michael Saxon" <saxonm at sgi.com>, "Frank Indiviglio" <frank.indiviglio at noaa.gov>, "Craig Tierney" <craig.tierney at noaa.gov>, "help >> GLERL IT Help" <oar.glerl.it-help at noaa.gov>, "Jeff Hanson" <jhanson at sgi.com>, "Brian Beagan" <beagan at sgi.com>, "John Cardenas" <cardenas at sgi.com> 
Sent: Friday, February 17, 2012 2:07:47 PM 
Subject: [torqueusers] Scheduler bound to ETHO IP port 

Hi all, 


I have been experiencing a problem with jobs staying in my default queue until I force execution with a qrun. It turns out that the reason is that my torque server is configured on my second ethernet interface which is connected to my compute nodes. The problem is that the scheduler is bound to the 1st interface port. 



[root at wings server_logs]# ps -ef | grep pbs 
root 1268 1 0 13:56 ? 00:00:00 /usr/local/sbin/pbs_server -d /var/spool/torque -H admin.default.domain 
root 14768 1 0 14:25 ? 00:00:00 /usr/local/sbin/pbs_sched -d /var/spool/torque 
root 21956 16623 0 14:41 pts/25 00:00:00 grep pbs 
[root at wings server_logs]# lsof -p 14768 
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME 
pbs_sched 14768 root cwd DIR 8,98 4096 6032970 /var/spool/torque/sched_priv 
pbs_sched 14768 root rtd DIR 8,98 4096 2 / 
pbs_sched 14768 root txt REG 8,98 268782 3421344 /usr/local/sbin/pbs_sched 
pbs_sched 14768 root mem REG 8,98 156872 3276802 /lib64/ ld-2.12.so 
pbs_sched 14768 root mem REG 8,98 1979000 3276803 /lib64/ libc-2.12.so 
pbs_sched 14768 root mem REG 8,98 65928 3277205 /lib64/ libnss_files-2.12.so 
pbs_sched 14768 root mem REG 8,98 791107 3418524 /usr/local/lib/libtorque.so.2.0.0 
pbs_sched 14768 root 0r CHR 1,3 0t0 3772 /dev/null 
pbs_sched 14768 root 1w REG 8,98 0 6033331 /var/spool/torque/sched_priv/sched_out 
pbs_sched 14768 root 2w REG 8,98 0 6033331 /var/spool/torque/sched_priv/sched_out 
pbs_sched 14768 root 3w REG 8,98 2699 6033359 /var/spool/torque/sched_logs/20120217 
pbs_sched 14768 root 4u IPv4 801882953 0t0 TCP wings.glerl.noaa.gov:15004 (LISTEN) 
pbs_sched 14768 root 5wW REG 8,98 7 6033329 /var/spool/torque/sched_priv/sched.lock 
pbs_sched 14768 root 6r REG 8,98 4374 6032952 /var/spool/torque/sched_priv/resource_group 
pbs_sched 14768 root 7w REG 8,98 0 6033360 /var/spool/torque/sched_priv/accounting/20120217 
[root at wings server_logs]# cd .. 
[root at wings torque]# ls 
aux checkpoint job_logs mom_logs mom_priv pbs_environment sched_logs sched_priv server_logs server_name server_priv spool undelivered 
[root at wings torque]# lsof -n -p 14768 
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME 
pbs_sched 14768 root cwd DIR 8,98 4096 6032970 /var/spool/torque/sched_priv 
pbs_sched 14768 root rtd DIR 8,98 4096 2 / 
pbs_sched 14768 root txt REG 8,98 268782 3421344 /usr/local/sbin/pbs_sched 
pbs_sched 14768 root mem REG 8,98 156872 3276802 /lib64/ ld-2.12.so 
pbs_sched 14768 root mem REG 8,98 1979000 3276803 /lib64/ libc-2.12.so 
pbs_sched 14768 root mem REG 8,98 65928 3277205 /lib64/ libnss_files-2.12.so 
pbs_sched 14768 root mem REG 8,98 791107 3418524 /usr/local/lib/libtorque.so.2.0.0 
pbs_sched 14768 root 0r CHR 1,3 0t0 3772 /dev/null 
pbs_sched 14768 root 1w REG 8,98 0 6033331 /var/spool/torque/sched_priv/sched_out 
pbs_sched 14768 root 2w REG 8,98 0 6033331 /var/spool/torque/sched_priv/sched_out 
pbs_sched 14768 root 3w REG 8,98 2699 6033359 /var/spool/torque/sched_logs/20120217 
pbs_sched 14768 root 4u IPv4 801882953 0t0 TCP 192.94.173.9:15004 (LISTEN) 
pbs_sched 14768 root 5wW REG 8,98 7 6033329 /var/spool/torque/sched_priv/sched.lock 
pbs_sched 14768 root 6r REG 8,98 4374 6032952 /var/spool/torque/sched_priv/resource_group 
pbs_sched 14768 root 7w REG 8,98 0 6033360 /var/spool/torque/sched_priv/accounting/20120217 
[root at wings torque]# ls 
aux checkpoint job_logs mom_logs mom_priv pbs_environment sched_logs sched_priv server_logs server_name server_priv spool undelivered 
[root at wings torque]# cd sched_priv 
[root at wings sched_priv]# ls 
accounting dedicated_time holidays resource_group sched_config sched.lock sched_out 
[root at wings sched_priv]# more sched_config 


When I used hostname to change the name to the admin.default.domain, and restarted the pbs_sched daemon, everything started working. 


Any idea how to change the hostname/IP/interface that the scheduler uses? 


Thanks, 


Christina 

-- 
Christina A. Salls 
GLERL Computer Group 
help.glerl at noaa.gov 
Help Desk x2127 
Christina.Salls at noaa.gov 
Voice Mail 734-741-2446 



_______________________________________________ 
torqueusers mailing list 
torqueusers at supercluster.org 
http://www.supercluster.org/mailman/listinfo/torqueusers 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120217/3aee8398/attachment.html 


More information about the torqueusers mailing list