[Mauiusers] Problems with maui/gold/torque integration on 64 bit IRIX

Kevin Van Workum vanw at tticluster.com
Tue Jan 30 07:34:36 MST 2007


Did you this?

su -c "gmkuser -d \"Maui Scheduler\" maui" gold
su -c "goldsh RoleUser Create Role=Scheduler Name=maui" gold

But it looks like you're running maui as root, so s/maui/root/ in the above.

Doubt it makes a difference, but you have a typo, "JOBFAILUREACTION=NON". I
think you meant NONE.

Kevin

On 1/29/07, Carlson, Timothy S <Timothy.Carlson at pnl.gov> wrote:
>
>  Here is my setup.
>
> I would like to integrate Maui/Gold/Torque on an IRIX64 running IRIX
> 6.5.30. I've got everything compiled and running with the IRIX compilers
> and I can submit a job without a
>
> #PBS -A myaccount
>
> line and thins runs fine. However when I add in a #PBS -A line, my jobs
> exits without running  and I get the following in my torque output file.
>
> mom_close_poll: entered
>
> And I get an email message of
>
> PBS Job Id: 4.nwvisus
> Job Name:   test
> Aborted by PBS Server
> Job cannot be executed
> See job standard error file
>
> The maui logs seem to indicate that gold was queried and in fact a charge
> was made after the job has run.
>
> INFO:     response received from server
> INFO:     response received: '<?xml version="1.0" encoding="UTF-8"?>
> <Envelope><Body><Response
> actor="root"><Status><Value>Success</Value><Code>000</Code><Message>Successfully
> charged job 4 for 62 credits 1 reservations were
> removed</Message></Status><Count>62</Count><Data><Charge><Amount>62</Amount><Job>174840</Job></Charge></Data></Response></Body></Envelope>
>
> '
> MSUDisconnect(S)
> INFO:     command response '<?xml version="1.0" encoding="UTF-8"?>
> <Envelope><Body><Response
> actor="root"><Status><Value>Success</Value><Code>000</Code><Message>Successfully
> charged job 4 for 62 credits
>
> 1 reservations were
> removed</Message></Status><Count>62</Count><Data><Charge><Amount>62</Amount><Job>174840</Job></Charge></Data></Response></Body></Envelope>
>
> '
>
> However, in the logs before that I see where there were problems
> contacting the resource manager. Not sure if this is normal or not
>
> ERROR:    cannot receive response from allocation-manager server
> 'dbserver':7112
> MSysRegEvent(FAILURE:  cannot receive response from allocation-manager
> server dbserver:7112 (cmd: '<XML>')
> ,0,0,1)
> MSysLaunchAction(ASList,1)
> INFO:     command response 'NULL'
> ALERT:    no job data available
> ALERT:    cannot extract status
> ALERT:    cannot reserve allocation for job
> WARNING:  cannot reserve allocation for job '4', reason: BankFailure
> MRMJobStart(4,Msg,SC)
> MPBSJobStart4,nwvisus,Msg,SC)
> MPBSJobModify4,Resource_List,Resource,nwvisus:ppn=2)
> MPBSJobModify(4,Resource_List,Resource,1:ppn=2)
>
>
> I built both Torque-2.1.6 and Maui-2.6.18 in 64 bit mode and fixed the
> configure problem in Torque so that the linker tried to link 64 bit and add
> the -D__M64 to the OSCCFLAGS of Maui.  I've also tried Torque-2.0.0pXX and
> snapshot version of Maui. All of this with gold-2.0.0.7
>
> I can query my gold database with basic gold commands from this machine
> and I'm fairly sure I have  configured my maui.cfg maui-private.cfg file
> correctly
>
> maui.cfg
>
> ---snip-----
> RMCFG[nwvisus] TYPE=PBS
>
> # Allocation Manager Definition
>
> AMCFG[bank]  TYPE=GOLD HOST=dbserver PORT=7112 SOCKETPROTOCOL=HTTP
> WIREPROTOCOL=XML CHARGEPOLICY=DEBITALLWC JOBFAILUREACTION=NON TIMEOUT=15
>
> ---snip---
>
> And maui-private-cfg
>
> CLIENTCFG[AM:bank] CSKEY=my_super_secret_key CSALGO=HMAC
>
> Is there something interesting I should be looking for in either my Torque
> or Maui log files?
>
> Thanks
>
> Tim Carlson
> Voice: (509) 376 3423
> Email: Tim.Carlson at pnl.gov
> Pacific Northwest National Laboratory
> HPCaNS: High Performance Computing and Networking Services
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>
>


-- 
Kevin Van Workum, Ph.D.
Vice President
Senior System Administrator
www.clusterondemand.com
ONLINE COMPUTER CLUSTERS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20070130/33837f88/attachment.html


More information about the mauiusers mailing list