[Mauiusers] Problem with TORQUE 2.3.0 and maui maui-3.2.6p19... checksum does not match/cannot read client packet

Filippo Spiga spiga.filippo at gmail.com
Fri May 23 04:15:38 MDT 2008


Hi all,
   this is the first time that i try to integrate MAUI and TORQUE on my HPC
cluster (http://scilx.disco.unimib.it/) following these instructions
http://www.clusterresources.com/products/maui/docs/pbsintegration.shtml.

After I compiled the sources, when I run maui executable these errors
appears..
$ showq
ERROR:    lost connection to server
ERROR:    cannot request service (status)
$ showconfig
ERROR:    lost connection to server
ERROR:    cannot request service (status)

TORQUE seems to work fine because ...

$ qstat -Q
Queue              Max   Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext
T
----------------   ---   ---   ---   ---   ---   ---   ---   ---   ---   ---
-
short                0     0   yes   yes     0     0     0     0     0     0
E
long                 0     0   yes   yes     0     0     0     0     0     0
E
default              0     0   yes   yes     0     0     0     0     0     0
R
devel                0     0   yes   yes     0     0     0     0     0     0
E
medium               0     0   yes   yes     0     0     0     0     0     0
E

$ pbsnodes -a
node01
     state = free
     np = 2
     properties = safe,sci2
     ntype = cluster
     status = opsys=linux,uname=Linux node01 2.6.22-3-686 #1 SMP Sun Feb 10
20:20:49 UTC 2008 i686,sessions=? 0,nsessions=?
0,nusers=0,idletime=790726,totmem=5132348kb,availmem=5024088kb,physmem=1035852kb,ncpus=?
0,loadave=0.00,netload=866968120,state=free,jobs=,varattr=,rectime=1211537417

node02
     state = free
     np = 2
     properties = safe,sci2
     ntype = cluster
     status = opsys=linux,uname=Linux node02 2.6.22-3-686 #1 SMP Sun Feb 10
20:20:49 UTC 2008 i686,sessions=? 15201,nsessions=?
15201,nusers=0,idletime=791855,totmem=5132348kb,availmem=5087924kb,physmem=1035852kb,ncpus=?
15201,loadave=0.00,netload=3945631823,state=free,jobs=,varattr=,rectime=1211537434

node03
     state = free
     np = 2
     properties = safe
     ntype = cluster
     status = opsys=linux,uname=Linux node03 2.6.22-3-686 #1 SMP Sun Feb 10
20:20:49 UTC 2008 i686,sessions=? 15201,nsessions=?
15201,nusers=0,idletime=792963,totmem=5132348kb,availmem=5089988kb,physmem=1035852kb,ncpus=?
15201,loadave=0.00,netload=3961890349,state=free,jobs=,varattr=,rectime=1211537435

node04
     state = free
     np = 2
     properties = safe,sci2
     ntype = cluster
     status = opsys=linux,uname=Linux node04 2.6.22-3-686 #1 SMP Sun Feb 10
20:20:49 UTC 2008 i686,sessions=? 15201,nsessions=?
15201,nusers=0,idletime=792869,totmem=5132348kb,availmem=5045880kb,physmem=1035852kb,ncpus=?
15201,loadave=0.00,netload=3201185645,state=free,jobs=,varattr=,rectime=1211537435


For TORQUE 2.3.0 I compiled sources in this way:
$ ./configure --with-rcp=rcp
$ make
$ make packages
$ make install

For maui maui-3.2.6p19 I compiled sources in this way:
$ ./configure --with-pbs
$ make
$ make install

Here's a pieceof /usr/local/maui/log/maui.log logfile....
05/23 12:02:43 INFO:     connect request from 10.0.1.1
05/23 12:02:43 INFO:     received service request from host '
scilx.disco.unimib.it'
05/23 12:02:43 INFO:     client socket from 'scilx.disco.unimib.it' accepted
05/23 12:02:43 UIProcessCommand(S)
05/23 12:02:43 MSURecvData(S,5000000,TRUE,SC,EMsg)
05/23 12:02:43 MSURecvPacket(7,BufP,9,NULL,5000000,SC)
05/23 12:02:43 MSURecvPacket(7,BufP,89,NULL,5000000,SC)
05/23 12:02:43 ALERT:    checksum does not match
(e4ce95d86901effd:5cb9b1121a647424)  request 'TS=1211536963 AUTH=root
DT=CMD=diagnose AUTH=root ARG=6 0 ALL [NONE]'
05/23 12:02:43 ALERT:    cannot read client packet
05/23 12:02:43 MSUDisconnect(S)
05/23 12:02:50 INFO:     connect request from 10.0.1.1
05/23 12:02:50 INFO:     received service request from host '
scilx.disco.unimib.it'
05/23 12:02:50 INFO:     client socket from 'scilx.disco.unimib.it' accepted
05/23 12:02:50 UIProcessCommand(S)
05/23 12:02:50 MSURecvData(S,5000000,TRUE,SC,EMsg)
05/23 12:02:50 MSURecvPacket(7,BufP,9,NULL,5000000,SC)
05/23 12:02:50 MSURecvPacket(7,BufP,77,NULL,5000000,SC)
05/23 12:02:50 ALERT:    checksum does not match
(9639adad8f21204a:ab2d4857aea9a410)  request 'TS=1211536970 AUTH=root
DT=CMD=showconfig AUTH=root ARG='
05/23 12:02:50 ALERT:    cannot read client packet
05/23 12:02:50 MSUDisconnect(S)


And this is the configuration of pbs_server ...
set server scheduling = True
set server acl_hosts = scilx.disco.unimib.it
set server managers = root at scilx.disco.unimib.it
set server operators = root at scilx.disco.unimib.it
set server default_queue = default
set server log_events = 511
set server mail_from = torque
set server query_other_jobs = True
set server scheduler_iteration = 30
set server node_check_rate = 60
set server tcp_timeout = 6
set server node_pack = True
set server next_job_number = 10689


How I can resolve my problem?

Thanks a lot. Regards.

-- 
Filippo Spiga
DISCo - FISLAB - Computational Physics and Complex Systems Laboratory
Rappresentante degli Studenti presso la Facoltà di Scienze Matematiche,
Fisiche e Naturali
Università degli Studi di Milano-Bicocca
mobile: +393408387735
Skype: filippo.spiga

C'e' un solo modo di dimenticare il tempo: impiegarlo.
-- Baudelaire, "Diari intimi"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20080523/5d890641/attachment.html


More information about the mauiusers mailing list