[Mauiusers] maui with gold problems...
Aquarijen
aquarijen at gmail.com
Fri Dec 16 12:51:10 MST 2005
Hi All,
I am not sure if this is a gold question or a maui question - so I am
posting to both - I hope that is ok...
Sorry for so many questions lately! So, I made sure that no users on
the test cluster have usernames begining with a number. I have gold
running and I have accounts, projects, machines and users set up with
100000000 deposited to each gold account.
If I configure maui to use gold as its AM, maui pretty much instantly
dies. I am using maui 3.2.6p13 and gold version 2.0.0.4. I cleared
out the checkpoint file. I shut everything down and cleared the
queue. I then started gold, then maui, then pbs_server and then the
pbs_moms. Maui dies. I've tried this in different orders, too. Maui
dies if I have the AMCFG line included.
Here is my simple maui.cfg:
# maui.cfg 3.2
SERVERHOST b05l02
ADMIN1 root tippensjl
RMCFG[base] TYPE=PBS
JOBAGGREGATIONTIME 00:00:10
RMPOLLINTERVAL 00:00:30
DOWNNODEDELAYTIME 72:00:00
SERVERPORT 42559
SERVERMODE NORMAL
LOGFILE maui.log
LOGFILEMAXSIZE 100000000
LOGLEVEL 9
QUEUETIMEWEIGHT[0] 10
FSPOLICY DEDICATEDPS
FSDEPTH 7
FSINTERVAL 24:00:00
FSWEIGHT 1
FSDECAY 0.80
BACKFILLPOLICY ON
BACKFILLTYPE BESTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEACCESSPOLICY SHARED
JOBMAXSTARTTIME 2:00:00
JOBMAXOVERRUN 0:30:00
AMCFG[bank] TYPE=GOLD HOST=b05l02 PORT=7112 SOCKETPROTOCOL=HTTP
WIRE-PROTOCOL=XML CHARGEPOLICY=DEBITALLWC JOBFAILUREACTION=NONE
FLUSHINTERVAL=12:00:00 TIMEOUT=15
And here is my maui-private.cfg:
CLIENTCFG[AM:bank] CSKEY=sss CSALGO=HMAC
And here is the last little bit of my maui.log. I have loglevel turned up to 9.
12/16 14:32:42 MUserAdd(UName,UP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MCPRestore(USER,tippensjl,Optr)
12/16 14:32:42 INFO: no checkpoint entry for object 'USER
tippensjl '
12/16 14:32:42 INFO: user tippensjl added
12/16 14:32:42 INFO: PBS attribute 'job_state' value: 'Q' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'queue' value: 'workq' (r: NULL)
12/16 14:32:42 MReqSetAttr(44,RQ,ReqClass,Value,1,2)
12/16 14:32:42 INFO: job flags for job 44: 0
12/16 14:32:42 MJobSetAttr(44,GAttr,Value,1,5)
12/16 14:32:42 MUMAGetBM(JFeature,PREEMPTEE,3)
12/16 14:32:42 INFO: attribute 'PREEMPTEE' cleared for job 44
12/16 14:32:42 MJobGetPAL(44,RPAL,PAL,NULL)
12/16 14:32:42 INFO: PBS attribute 'server' value: 'b05l02' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Checkpoint' value: 'u' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'ctime' value: '1134761206' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Error_Path' value:
'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.e44'
(r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Hold_Types' value: 'n' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Join_Path' value: 'n' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Keep_Files' value: 'n' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Mail_Points' value: 'ae' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Mail_Users' value:
'tippensjl at ornl.gov' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'mtime' value: '1134761206' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Output_Path' value:
'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.o44'
(r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Priority' value: '0' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'qtime' value: '1134761206' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Rerunable' value: 'True' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Resource_List' value:
'10000:00:00' (r: cput)
12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: '1' (r: ncpus)
12/16 14:32:42 INFO: PBS attribute 'Resource_List' value:
'30:ppn=2' (r: neednodes)
12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0)
12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2)
12/16 14:32:42 INFO: 0 host task(s) located for job
12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: '30'
(r: nodect)12/16 14:32:42 INFO: PBS attribute 'Resource_List'
value: '30:ppn=2' (r: nodes)
12/16 14:32:42 INFO: processing node request line '30:ppn=2'
12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0)
12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2)
12/16 14:32:42 INFO: 0 host task(s) located for job
12/16 14:32:42 INFO: PBS attribute 'Resource_List' value:
'10000:00:00' (r: walltime)
12/16 14:32:42 INFO: PBS attribute 'Shell_Path_List' value:
'/bin/bash' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'substate' value: '10' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'Variable_List' value:
'PBS_O_HOME=/home/2vt,PBS_O_LANG=en_US.UTF-8,PBS_O_LOGNAME=tippensjl,PBS_O_PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,PBS_O_MAIL=/var/spool/mail/tippensjl,PBS_O_SHELL=/bin/bash,PBS_O_HOST=b05l02,PBS_O_WORKDIR=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,MODULE_VERSION_STACK=3.1.6,MANPATH=/opt/intel/cce/9.0/man:/opt/intel/fce/9.0/man:/opt/mpich-ch_p4-icc-1.2.7/man:/opt/modules/default/man:/usr/share/man:/usr/man:/usr/local/share/man:/usr/local/man:/usr/X11R6/man:/opt/pbs/man:/opt/env-switcher/man:/opt/kernel_picker/man:/opt/pvm3/man,HOSTNAME=b05l02,PVM_RSH=ssh,_MODULESBEGINENV_=/home/2vt/.modulesbeginenv,SHELL=/bin/bash,TERM=xterm,HISTSIZE=1000,TMPDIR=/home/2vt/.tmpdir,MODULE_SHELL=sh,OLDPWD=/home/2vt,MODULE_OSCAR_USER=tippensjl,USER=tippensjl,LD_LIBRARY_PATH=/opt/intel/mkl72/lib/em64t:/opt/intel/cce/9.0/lib:/opt/intel/fce/9.0/lib,LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,ENV=/home/2vt/.bashrc,OSCAR_HOME=/opt/oscar,PVM_ROOT=/opt/pvm3,PVM_ARCH=LINUX,MODULE_VERSION=3.1.6,MAIL=/var/spool/mail/tippensjl,PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,INPUTRC=/etc/inputrc,PWD=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,_LMFILES_=/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/torque/1.2.0p5:/opt/env-switcher/share/env-switcher/mpi/mpich-ch_p4-icc-1.2.7:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/kernel_picker/1.4.1.3:/opt/modules/oscar-modulefiles/pvm/3.4.5+4:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/modulefiles/iforte/9.0:/opt/modules/modulefiles/icce/9.0:/opt/modules/modulefiles/mkl-em64t/7.2,LANG=en_US.UTF-8,MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles:,LOADEDMODULES=default-manpath/1.0.1:torque/1.2.0p5:mpi/mpich-ch_p4-icc-1.2.7:switcher/1.0.13:kernel_picker/1.4.1.3:pvm/3.4.5+4:oscar-modules/1.0.5:iforte/9.0:icce/9.0:mkl-em64t/7.2,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=1,HOME=/home/2vt,LOGNAME=tippensjl,MODULESHOME=/opt/modules/3.1.6,LESSOPEN=|/usr/bin/lesspipe.sh
%s,G_BROKEN_FILENAMES=1,_=/opt/pbs/bin/qsub,PBS_O_QUEUE=workq' (r:
NULL)
12/16 14:32:42 INFO: PBS attribute 'euser' value: 'tippensjl' (r: NULL)
12/16 14:32:42 MUserAdd(UName,UP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 INFO: PBS attribute 'egroup' value: 'tippensjl' (r: NULL)
12/16 14:32:42 MGroupAdd(GName,GP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MCPRestore(GROUP,tippensjl,Optr)
12/16 14:32:42 INFO: no checkpoint entry for object 'GROUP
tippensjl '
12/16 14:32:42 INFO: group tippensjl added
12/16 14:32:42 INFO: PBS attribute 'queue_rank' value: '41' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'queue_type' value: 'E' (r: NULL)
12/16 14:32:42 INFO: PBS attribute 'etime' value: '1134761206' (r: NULL)
12/16 14:32:42 MJobSetCreds(44,tippensjl,tippensjl,)
12/16 14:32:42 MUserAdd(UName,UP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MGroupAdd(GName,GP)
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MUGetHash(tippensjl)
12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005
12/16 14:32:42 MJobGetAccount(44,A)
12/16 14:32:42 MAMAccountGetDefault(tippensjl,AName,RIndex)
12/16 14:32:42 MSSSDoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg)
12/16 14:32:42 MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;)
12/16 14:32:42 INFO: EM disabled
12/16 14:32:42 MSUConnect(S,TRUE,EMsg)
12/16 14:32:42 INFO: trying to connect to 192.168.79.231 (Port: 7112)
12/16 14:32:42 INFO: successful connect to TCP server (sd: 10)
12/16 14:32:42 MSUSendData(S,15000000,FALSE,FALSE)
12/16 14:32:42 MSecGetChecksum(Buf,185,Checksum,HMAC64,CSKey)
12/16 14:32:42 MSecHMACGetDigest(sss,3,<Body actor="root"><Request
action="Query" actor="root"><Object>User</Object><Where
name="Special">False</Where><Get name="Name"></Get><Get
name="DefaultProject"></Get></Request></Body>,185,CSString,20,DigestString,TRUE,TRUE)
12/16 14:32:42 __MSecSHA1Init(context)
12/16 14:32:42 __MSecSHA1Transform(context)
And that's it - it just dies. I have the feeling that this is
something fairly easy that I didn't set up correctly... Just can't
seem to find what it is - I'm pretty new at this... Oh, yeah, I am
using torque 2.0.0p2 if that makes a difference.
Thank you for any help you can give - I'm pulling my hair out. :-O :)
-Jen
Jennifer Tippens
Unix Admin, ORNL Institutional Cluster
Oak Ridge National Lab
More information about the mauiusers
mailing list