[torqueusers] PBS not seeing nodes after 3.0.5 to 4.2.6 upgrade

Daniel Davidson danield at igb.uiuc.edu
Tue Nov 26 10:54:15 MST 2013


I just upgraded our torque from 3.0.5 to 4.2.6 (numalink and pam 
enabled) and now I cannot get our nodes to show as on line.  Only 4 of 
our nodes are sgis that need numalink.

Any ideas?  I do not understand why this is.

Dan

Compute-5-0 is not an SGI

Example:

# pbsnodes compute-5-0
compute-5-0
      state = down
      np = 24
      properties = eval
      ntype = cluster
      mom_service_port = 15002
      mom_manager_port = 15003

[root at compute-5-0 mom_logs]# momctl -d 3

Host: compute-5-0/compute-5-0.local   Version: 4.2.6   PID: 5708
Server[0]: biocluster.local (10.1.1.1:15001)
   Last Msg From Server:   2232 seconds (CLUSTER_ADDRS)
   WARNING:  no messages sent to server
HomeDirectory:          /var/spool/torque/mom_priv
stdout/stderr spool directory: '/var/spool/torque/spool/' (887622blocks 
available)
NOTE:  syslog enabled
MOM active:             2255 seconds
Check Poll Time:        45 seconds
Server Update Interval: 45 seconds
LogLevel:               0 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model:    TCP
MemLocked:              TRUE  (mlock)
TCP Timeout:            60 seconds
Prolog:                 /var/spool/torque/mom_priv/prologue (disabled)
Alarm Time:             0 of 10 seconds
Trusted Client List: 
10.1.1.1:0,10.1.255.211:0,10.1.255.211:15003,10.1.255.212:15003,10.1.255.213:15003,10.1.255.214:15003,10.1.255.215:15003,10.1.255.216:15003,10.1.255.217:15003,10.1.255.218:15003,10.1.255.219:15003,10.1.255.220:15003,10.1.255.221:15003,10.1.255.222:15003,10.1.255.223:15003,10.1.255.224:15003,10.1.255.225:15003,10.1.255.226:15003,10.1.255.227:15003,10.1.255.228:15003,10.1.255.229:15003,10.1.255.230:15003,10.1.255.231:15003,10.1.255.232:15003,10.1.255.233:15003,10.1.255.234:15003,10.1.255.235:15003,10.1.255.236:15003,10.1.255.237:15003,10.1.255.238:15003,10.1.255.239:15003,10.1.255.240:15003,10.1.255.241:15003,10.1.255.242:15003,10.1.255.243:15003,10.1.255.244:15003,10.1.255.245:15003,10.1.255.246:15003,10.1.255.247:15003,10.1.255.248:15003,10.1.255.249:15003,10.1.255.250:15003,10.1.255.251:15003,10.1.255.252:15003,10.1.255.253:15003,10.1.255.254:15003,127.0.0.1:0: 
0
Copy Command:           /usr/bin/scp -rpB
NOTE:  no local jobs detected



More information about the torqueusers mailing list