[Mauiusers] maui with gold problems...

Aquarijen aquarijen at gmail.com
Fri Dec 16 12:51:10 MST 2005


Hi All,

I am not sure if this is a gold question or a maui question - so I am
posting to both - I hope that is ok...
Sorry for so many questions lately!  So, I made sure that no users on
the test cluster have usernames begining with a number.  I have gold
running and I have accounts, projects, machines and users set up with
100000000 deposited to each gold account.
If I configure maui to use gold as its AM, maui pretty much instantly
dies.  I am using maui 3.2.6p13 and gold version 2.0.0.4.  I cleared
out the checkpoint file.  I shut everything down and cleared the
queue.  I then started gold, then maui, then pbs_server and then the
pbs_moms.  Maui dies.  I've tried this in different orders, too.  Maui
dies if I have the AMCFG line included.

Here is my simple maui.cfg:

# maui.cfg 3.2
SERVERHOST              b05l02
ADMIN1                root tippensjl
RMCFG[base]  TYPE=PBS
JOBAGGREGATIONTIME      00:00:10
RMPOLLINTERVAL  00:00:30
DOWNNODEDELAYTIME       72:00:00
SERVERPORT            42559
SERVERMODE            NORMAL
LOGFILE               maui.log
LOGFILEMAXSIZE        100000000
LOGLEVEL              9
QUEUETIMEWEIGHT[0]       10
FSPOLICY              DEDICATEDPS
FSDEPTH               7
FSINTERVAL            24:00:00
FSWEIGHT              1
FSDECAY               0.80
BACKFILLPOLICY  ON
BACKFILLTYPE    BESTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEACCESSPOLICY        SHARED
JOBMAXSTARTTIME         2:00:00
JOBMAXOVERRUN           0:30:00
AMCFG[bank] TYPE=GOLD HOST=b05l02 PORT=7112 SOCKETPROTOCOL=HTTP
WIRE-PROTOCOL=XML CHARGEPOLICY=DEBITALLWC JOBFAILUREACTION=NONE
FLUSHINTERVAL=12:00:00 TIMEOUT=15

And here is my maui-private.cfg:
CLIENTCFG[AM:bank] CSKEY=sss CSALGO=HMAC

And here is the last little bit of my maui.log.  I have loglevel turned up to 9.

12/16 14:32:42 MUserAdd(UName,UP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MCPRestore(USER,tippensjl,Optr)
12/16 14:32:42 INFO:     no checkpoint entry for object 'USER         
       tippensjl '
12/16 14:32:42 INFO:     user tippensjl added
12/16 14:32:42 INFO:     PBS attribute 'job_state'  value: 'Q'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'queue'  value: 'workq'  (r: NULL)
12/16 14:32:42 MReqSetAttr(44,RQ,ReqClass,Value,1,2)
12/16 14:32:42 INFO:     job flags for job 44: 0
12/16 14:32:42 MJobSetAttr(44,GAttr,Value,1,5)
12/16 14:32:42 MUMAGetBM(JFeature,PREEMPTEE,3)
12/16 14:32:42 INFO:     attribute 'PREEMPTEE' cleared for job 44
12/16 14:32:42 MJobGetPAL(44,RPAL,PAL,NULL)
12/16 14:32:42 INFO:     PBS attribute 'server'  value: 'b05l02'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Checkpoint'  value: 'u'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'ctime'  value: '1134761206'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Error_Path'  value:
'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.e44'
 (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Hold_Types'  value: 'n'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Join_Path'  value: 'n'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Keep_Files'  value: 'n'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Mail_Points'  value: 'ae'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Mail_Users'  value:
'tippensjl at ornl.gov'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'mtime'  value: '1134761206'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Output_Path'  value:
'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.o44'
 (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Priority'  value: '0'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'qtime'  value: '1134761206'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Rerunable'  value: 'True'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Resource_List'  value:
'10000:00:00'  (r: cput)
12/16 14:32:42 INFO:     PBS attribute 'Resource_List'  value: '1'  (r: ncpus)
12/16 14:32:42 INFO:     PBS attribute 'Resource_List'  value:
'30:ppn=2'  (r: neednodes)
12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0)
12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2)
12/16 14:32:42 INFO:     0 host task(s) located for job
12/16 14:32:42 INFO:     PBS attribute 'Resource_List'  value: '30' 
(r: nodect)12/16 14:32:42 INFO:     PBS attribute 'Resource_List' 
value: '30:ppn=2'  (r: nodes)
12/16 14:32:42 INFO:     processing node request line '30:ppn=2'
12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0)
12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2)
12/16 14:32:42 INFO:     0 host task(s) located for job
12/16 14:32:42 INFO:     PBS attribute 'Resource_List'  value:
'10000:00:00'  (r: walltime)
12/16 14:32:42 INFO:     PBS attribute 'Shell_Path_List'  value:
'/bin/bash'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'substate'  value: '10'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'Variable_List'  value:
'PBS_O_HOME=/home/2vt,PBS_O_LANG=en_US.UTF-8,PBS_O_LOGNAME=tippensjl,PBS_O_PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,PBS_O_MAIL=/var/spool/mail/tippensjl,PBS_O_SHELL=/bin/bash,PBS_O_HOST=b05l02,PBS_O_WORKDIR=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,MODULE_VERSION_STACK=3.1.6,MANPATH=/opt/intel/cce/9.0/man:/opt/intel/fce/9.0/man:/opt/mpich-ch_p4-icc-1.2.7/man:/opt/modules/default/man:/usr/share/man:/usr/man:/usr/local/share/man:/usr/local/man:/usr/X11R6/man:/opt/pbs/man:/opt/env-switcher/man:/opt/kernel_picker/man:/opt/pvm3/man,HOSTNAME=b05l02,PVM_RSH=ssh,_MODULESBEGINENV_=/home/2vt/.modulesbeginenv,SHELL=/bin/bash,TERM=xterm,HISTSIZE=1000,TMPDIR=/home/2vt/.tmpdir,MODULE_SHELL=sh,OLDPWD=/home/2vt,MODULE_OSCAR_USER=tippensjl,USER=tippensjl,LD_LIBRARY_PATH=/opt/intel/mkl72/lib/em64t:/opt/intel/cce/9.0/lib:/opt/intel/fce/9.0/lib,LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,ENV=/home/2vt/.bashrc,OSCAR_HOME=/opt/oscar,PVM_ROOT=/opt/pvm3,PVM_ARCH=LINUX,MODULE_VERSION=3.1.6,MAIL=/var/spool/mail/tippensjl,PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,INPUTRC=/etc/inputrc,PWD=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,_LMFILES_=/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/torque/1.2.0p5:/opt/env-switcher/share/env-switcher/mpi/mpich-ch_p4-icc-1.2.7:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/kernel_picker/1.4.1.3:/opt/modules/oscar-modulefiles/pvm/3.4.5+4:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/modulefiles/iforte/9.0:/opt/modules/modulefiles/icce/9.0:/opt/modules/modulefiles/mkl-em64t/7.2,LANG=en_US.UTF-8,MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles:,LOADEDMODULES=default-manpath/1.0.1:torque/1.2.0p5:mpi/mpich-ch_p4-icc-1.2.7:switcher/1.0.13:kernel_picker/1.4.1.3:pvm/3.4.5+4:oscar-modules/1.0.5:iforte/9.0:icce/9.0:mkl-em64t/7.2,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=1,HOME=/home/2vt,LOGNAME=tippensjl,MODULESHOME=/opt/modules/3.1.6,LESSOPEN=|/usr/bin/lesspipe.sh
%s,G_BROKEN_FILENAMES=1,_=/opt/pbs/bin/qsub,PBS_O_QUEUE=workq'  (r:
NULL)
12/16 14:32:42 INFO:     PBS attribute 'euser'  value: 'tippensjl'  (r: NULL)
12/16 14:32:42 MUserAdd(UName,UP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 INFO:     PBS attribute 'egroup'  value: 'tippensjl'  (r: NULL)
12/16 14:32:42 MGroupAdd(GName,GP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MCPRestore(GROUP,tippensjl,Optr)
12/16 14:32:42 INFO:     no checkpoint entry for object 'GROUP        
       tippensjl '
12/16 14:32:42 INFO:     group tippensjl added
12/16 14:32:42 INFO:     PBS attribute 'queue_rank'  value: '41'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'queue_type'  value: 'E'  (r: NULL)
12/16 14:32:42 INFO:     PBS attribute 'etime'  value: '1134761206'  (r: NULL)
12/16 14:32:42 MJobSetCreds(44,tippensjl,tippensjl,)
12/16 14:32:42 MUserAdd(UName,UP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MGroupAdd(GName,GP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO:     hash 'tippensjl' --> 550228005
12/16 14:32:42 MJobGetAccount(44,A)
12/16 14:32:42 MAMAccountGetDefault(tippensjl,AName,RIndex)
12/16 14:32:42 MSSSDoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
12/16 14:32:42 MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;)
12/16 14:32:42 INFO:     EM disabled
12/16 14:32:42 MSUConnect(S,TRUE,EMsg)
12/16 14:32:42 INFO:     trying to connect to 192.168.79.231 (Port: 7112)
12/16 14:32:42 INFO:     successful connect to TCP server (sd: 10)
12/16 14:32:42 MSUSendData(S,15000000,FALSE,FALSE)
12/16 14:32:42 MSecGetChecksum(Buf,185,Checksum,HMAC64,CSKey)
12/16 14:32:42 MSecHMACGetDigest(sss,3,<Body actor="root"><Request
action="Query" actor="root"><Object>User</Object><Where
name="Special">False</Where><Get name="Name"></Get><Get
name="DefaultProject"></Get></Request></Body>,185,CSString,20,DigestString,TRUE,TRUE)
12/16 14:32:42 __MSecSHA1Init(context)
12/16 14:32:42 __MSecSHA1Transform(context)

And that's it - it just dies.  I have the feeling that this is
something fairly easy that I didn't set up correctly...  Just can't
seem to find what it is - I'm pretty new at this...  Oh, yeah, I am
using torque 2.0.0p2 if that makes a difference.

Thank you for any help you can give - I'm pulling my hair out. :-O :)

-Jen

Jennifer Tippens
Unix Admin, ORNL Institutional Cluster
Oak Ridge National Lab


More information about the mauiusers mailing list