[Mauiusers] Problems with maui/gold/torque integration on 64 bit IRIX

Carlson, Timothy S Timothy.Carlson at pnl.gov
Mon Jan 29 17:50:53 MST 2007


Here is my setup.

I would like to integrate Maui/Gold/Torque on an IRIX64 running IRIX
6.5.30. I've got everything compiled and running with the IRIX compilers
and I can submit a job without a 

#PBS -A myaccount

line and thins runs fine. However when I add in a #PBS -A line, my jobs
exits without running  and I get the following in my torque output file.

mom_close_poll: entered

And I get an email message of

PBS Job Id: 4.nwvisus
Job Name:   test
Aborted by PBS Server
Job cannot be executed
See job standard error file

The maui logs seem to indicate that gold was queried and in fact a
charge was made after the job has run.

INFO:     response received from server
INFO:     response received: '<?xml version="1.0" encoding="UTF-8"?>
<Envelope><Body><Response
actor="root"><Status><Value>Success</Value><Code>000</Code><Message>Succ
essfully charged job 4 for 62 credits 1 reservations were
removed</Message></Status><Count>62</Count><Data><Charge><Amount>62</Amo
unt><Job>174840</Job></Charge></Data></Response></Body></Envelope>
'
MSUDisconnect(S)
INFO:     command response '<?xml version="1.0" encoding="UTF-8"?>
<Envelope><Body><Response
actor="root"><Status><Value>Success</Value><Code>000</Code><Message>Succ
essfully charged job 4 for 62 credits
1 reservations were
removed</Message></Status><Count>62</Count><Data><Charge><Amount>62</Amo
unt><Job>174840</Job></Charge></Data></Response></Body></Envelope>
'

However, in the logs before that I see where there were problems
contacting the resource manager. Not sure if this is normal or not

ERROR:    cannot receive response from allocation-manager server
'dbserver':7112
MSysRegEvent(FAILURE:  cannot receive response from allocation-manager
server dbserver:7112 (cmd: '<XML>')
,0,0,1)
MSysLaunchAction(ASList,1)
INFO:     command response 'NULL'
ALERT:    no job data available
ALERT:    cannot extract status
ALERT:    cannot reserve allocation for job
WARNING:  cannot reserve allocation for job '4', reason: BankFailure
MRMJobStart(4,Msg,SC)
MPBSJobStart4,nwvisus,Msg,SC)
MPBSJobModify4,Resource_List,Resource,nwvisus:ppn=2)
MPBSJobModify(4,Resource_List,Resource,1:ppn=2)



I built both Torque-2.1.6 and Maui-2.6.18 in 64 bit mode and fixed the
configure problem in Torque so that the linker tried to link 64 bit and
add the -D__M64 to the OSCCFLAGS of Maui.  I've also tried
Torque-2.0.0pXX and snapshot version of Maui. All of this with
gold-2.0.0.7

I can query my gold database with basic gold commands from this machine
and I'm fairly sure I have  configured my maui.cfg maui-private.cfg file
correctly 

maui.cfg

---snip-----
RMCFG[nwvisus] TYPE=PBS

# Allocation Manager Definition

AMCFG[bank]  TYPE=GOLD HOST=dbserver PORT=7112 SOCKETPROTOCOL=HTTP
WIREPROTOCOL=XML CHARGEPOLICY=DEBITALLWC JOBFAILUREACTION=NON TIMEOUT=15
---snip---


And maui-private-cfg

CLIENTCFG[AM:bank] CSKEY=my_super_secret_key CSALGO=HMAC

Is there something interesting I should be looking for in either my
Torque or Maui log files? 

Thanks

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
Pacific Northwest National Laboratory
HPCaNS: High Performance Computing and Networking Services

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20070129/a0dfc6e7/attachment.html


More information about the mauiusers mailing list