[Mauiusers] Problem integrating Maui and Torque

Michael Homa mhoma at uic.edu
Mon Mar 31 14:21:51 MDT 2008


Hi:

Am upgrading from PBSsched to Maui (version maui-3.2.6p19) as well as
upgrading Torque from version torque-1.2.0p3 to torque-2.2.1.

Torque (server and client) appear to be working. I can submit a job to a
single test queue as well as see the single test node with pbsnodes -a.

The problem is with integrating Maui. I've compiled it and permit it to
talk to Torque via port 42559. When I start Maui, I get the log message:
  WARNING:  no resources detected

I'm new to Maui and, according to the manual (RTFM), "information about
nodes is provided to the scheduler chiefly by the resource manager." So
Maui is supposed to get the node information from torque.

When I submit echo "test" | qsub, activity occurs in the maui log:

03/31 14:56:23 INFO:     scheduling complete.  sleeping 30 seconds
03/31 14:56:29 INFO:     connect request from 128.248.121.64
03/31 14:56:29 INFO:     received service request from host
'argo.cc.uic.edu'
03/31 14:56:29 INFO:     client socket from 'argo.cc.uic.edu' accepted
03/31 14:56:29 UIProcessCommand(S)
03/31 14:56:29 MSURecvData(S,5000000,TRUE,SC,EMsg)
03/31 14:56:29 MSURecvPacket(8,BufP,9,NULL,5000000,SC)
03/31 14:56:34 MSUSelectRead-select failed
03/31 14:56:34 WARNING:  cannot receive message within 5.000000 second
timeout (aborting)
03/31 14:56:34 ALERT:    cannot determine packet size
03/31 14:56:34 ALERT:    cannot read client packet
03/31 14:56:34 MSUDisconnect(S)
03/31 14:56:54 ServerProcessRequests()
03/31 14:56:54 INFO:     not rolling logs (16548380 < 500000000)
03/31 14:56:54 MResAdjust(NULL,0,0)
03/31 14:56:54 MStatInitializeActiveSysUsage()
03/31 14:56:54 MStatClearUsage([NONE],Active)
03/31 14:56:54 ServerUpdate()
03/31 14:56:54 MSysUpdateTime()
03/31 14:56:54 INFO:     starting iteration 245
03/31 14:56:54 MRMGetInfo()
03/31 14:56:54 MClusterClearUsage()
03/31 14:56:54 MRMClusterQuery()
03/31 14:56:54 WARNING:  no resources detected
03/31 14:56:54 MRMWorkloadQuery()
03/31 14:56:54 WARNING:  no workload detected

Therefore, there is communication between the Maui and Torque but showq
indicated no jobs running, idle, or queued. And, checknode finds nothing:

ERROR:    'checknode' failed
ERROR:  cannot locate node 'argo17-1'

A search of the archives indicated that the error message may occur
because PBS_DEFAULT is not set and there is mismatch exists between the
entry in the maui.cfg and the torque server_name file. But, I've set the
PBS_DEFAULT to argo.cc.uic.edu as well as confirmed:

   cat  /var/spool/torque/server_name
          argo.cc.uic.edu   <-----------------------+
                                                    |
   cat argo.cc.uic.edu /usr/common/maui/maui.cfg    |
          SERVERHOST            argo.cc.uic.edu     |
          RMHOST[0] argo.cc.uic.edu  <--------------+
          RMTYPE[0] PBS

Permissions on the server_name file are 544 and the torque spool directory
itself are 755. Early on, I disabled iptables and confirmed that the
hostname is in DNS.

Regarding the maui.cfg, I was using the following but changed it as part
of my search for a solution:

   RMCFG[ARGO.CC.UIC.EDU] TYPE=PBS

I'm out of new ideas and am just repeating things I've already tried. If
someone could point me in an untried direction, that would be great.

Michael Homa
Operating Systems Support and Database Group
Academic Computing and Communication Center
University of Illinois at Chicago
email:  mhoma at uic.edu



More information about the mauiusers mailing list