[torqueusers] PBS not seeing nodes after 3.0.5 to 4.2.6 upgrade

Ken Nielson knielson at adaptivecomputing.com
Tue Dec 3 10:06:03 MST 2013


Daniel,

Have you been able to resolve this yet?

Regards


On Tue, Nov 26, 2013 at 10:54 AM, Daniel Davidson <danield at igb.uiuc.edu>wrote:

> I just upgraded our torque from 3.0.5 to 4.2.6 (numalink and pam
> enabled) and now I cannot get our nodes to show as on line.  Only 4 of
> our nodes are sgis that need numalink.
>
> Any ideas?  I do not understand why this is.
>
> Dan
>
> Compute-5-0 is not an SGI
>
> Example:
>
> # pbsnodes compute-5-0
> compute-5-0
>       state = down
>       np = 24
>       properties = eval
>       ntype = cluster
>       mom_service_port = 15002
>       mom_manager_port = 15003
>
> [root at compute-5-0 mom_logs]# momctl -d 3
>
> Host: compute-5-0/compute-5-0.local   Version: 4.2.6   PID: 5708
> Server[0]: biocluster.local (10.1.1.1:15001)
>    Last Msg From Server:   2232 seconds (CLUSTER_ADDRS)
>    WARNING:  no messages sent to server
> HomeDirectory:          /var/spool/torque/mom_priv
> stdout/stderr spool directory: '/var/spool/torque/spool/' (887622blocks
> available)
> NOTE:  syslog enabled
> MOM active:             2255 seconds
> Check Poll Time:        45 seconds
> Server Update Interval: 45 seconds
> LogLevel:               0 (use SIGUSR1/SIGUSR2 to adjust)
> Communication Model:    TCP
> MemLocked:              TRUE  (mlock)
> TCP Timeout:            60 seconds
> Prolog:                 /var/spool/torque/mom_priv/prologue (disabled)
> Alarm Time:             0 of 10 seconds
> Trusted Client List:
> 10.1.1.1:0,10.1.255.211:0,10.1.255.211:15003,10.1.255.212:15003,
> 10.1.255.213:15003,10.1.255.214:15003,10.1.255.215:15003,
> 10.1.255.216:15003,10.1.255.217:15003,10.1.255.218:15003,
> 10.1.255.219:15003,10.1.255.220:15003,10.1.255.221:15003,
> 10.1.255.222:15003,10.1.255.223:15003,10.1.255.224:15003,
> 10.1.255.225:15003,10.1.255.226:15003,10.1.255.227:15003,
> 10.1.255.228:15003,10.1.255.229:15003,10.1.255.230:15003,
> 10.1.255.231:15003,10.1.255.232:15003,10.1.255.233:15003,
> 10.1.255.234:15003,10.1.255.235:15003,10.1.255.236:15003,
> 10.1.255.237:15003,10.1.255.238:15003,10.1.255.239:15003,
> 10.1.255.240:15003,10.1.255.241:15003,10.1.255.242:15003,
> 10.1.255.243:15003,10.1.255.244:15003,10.1.255.245:15003,
> 10.1.255.246:15003,10.1.255.247:15003,10.1.255.248:15003,
> 10.1.255.249:15003,10.1.255.250:15003,10.1.255.251:15003,
> 10.1.255.252:15003,10.1.255.253:15003,10.1.255.254:15003,127.0.0.1:0:
> 0
> Copy Command:           /usr/bin/scp -rpB
> NOTE:  no local jobs detected
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131203/3a2615a6/attachment-0001.html 


More information about the torqueusers mailing list