[Mauiusers] Maui vs pbs_sched

Andrus, Mr. Brian (Contractor) brian.andrus at nrlmry.navy.mil
Mon Oct 29 12:29:27 MDT 2007


I am still having great trouble with this.

My maui.cfg:
---------------------
# MAUI configuration example
# @(#)maui.cfg David Groep 20031015.1
# for MAUI version 3.2.5
#
SERVERHOST              cluster0
TYPE=PBS
PORT=15001
EPORT=15004
# Set PBS server polling interval. Since we have many short jobs
# and want fast turn-around, set this to 10 seconds (default: 2 minutes)
RMPOLLINTERVAL          00:00:10

# a max. 10 MByte log file in a logical location
LOGFILE                 /var/log/maui.log
LOGFILEMAXSIZE          10000000
LOGLEVEL                3
JOBNODEMATCHPOLICY      EXACTNODE
NODEALLOCATIONPOLICY    PRIORITY
NODECFG[DEFAULT] PRIORITY='- JOBCOUNT'
SCHEDCFG[base]          SERVER=cluster0:42559
ADMINCFG[1]             USERS=root
RMCFG[base]             TYPE=PBS
RMCFG[base]             SBINDIR=/opt/torque/sbin
--------------------- 

My script:
-------------------
#!/bin/bash
#PBS -j oe
#PBS -l nodes=8:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N TestJob
#PBS -q medium
#PBS -o output.txt
date
mpiexec --bynode /data/andrus/hello
sleep 10
---------------------

My qstat -f
-------------------------
Job Id: 2623.cluster0.default.domain
    Job_Name = TestJob
    Job_Owner = andrus at cluster0.default.domain
    job_state = R
    queue = medium
    server = cluster0.default.domain
    Checkpoint = u
    ctime = Mon Oct 29 11:27:45 2007
    Error_Path =
cluster0.default.domain:/users/andrus/data/TestJob.e2623
    exec_host = n1/1+n1/0
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Mon Oct 29 11:27:46 2007
    Output_Path = cluster0:/users/andrus/data/output.txt
    Priority = 0
    qtime = Mon Oct 29 11:27:45 2007
    Rerunable = True
    Resource_List.mem = 768mb
    Resource_List.ncpus = 2
    Resource_List.nodect = 8
    Resource_List.nodes = 8:ppn=1
    Resource_List.walltime = 04:00:00
    session_id = 25489
    Variable_List = PBS_O_HOME=/users/andrus,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=andrus,
 
PBS_O_PATH=/opt/cwx/bin:/usr/totalview/bin/:/opt/torque/bin:/opt/pgi/
 
linux86-64/7.0-7/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/u
 
sr/X11R6/bin:/opt/gm/bin:/usr/local/ncarg/bin:/usr/openv/netbackup/bin
        :/users/andrus/bin,PBS_O_MAIL=/var/spool/mail/andrus,
        PBS_O_SHELL=/bin/bash,PBS_O_HOST=cluster0.default.domain,
        PBS_O_WORKDIR=/users/andrus/data,PBS_O_QUEUE=medium
    etime = Mon Oct 29 11:27:45 2007
    x = NACCESSPOLICY:SINGLEJOB
-------------------------

Notice: It is only executing on n1/1+n1/0. I am requesting 8 nodes, but
it only runs on 1. It has 8 nodes in the resource list, but does not run
it on them all. If I use the default torque scheduler, it works as
expected.

Anyone have any ideas??


Brian Andrus perotsystems 
Site Manager | Sr. Computer Scientist 
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866 



More information about the mauiusers mailing list