[Mauiusers] Newbie question: problems with getting any jobs to run with PBSpro 5.4.2 and Maui 3.2.6

Nicko Acks nacks at nccs.nasa.gov
Wed Jul 20 14:42:19 MDT 2005


I am relatively new to Maui but I have been using pbs for quite some
time.  My test env is a 32 processor Origin 3800 running IRIX 6.5.24.

PBSpro 5.4.2:

only the pbs_server and pbs_mom processes are being started.

mom_priv/config:
$logevent 0x1ff
$clienthost gmao-test

output of qmgr -c "p s":

create queue gmao-test
set queue gmao-test queue_type = Execution
set queue gmao-test resources_max.mem = 16384mb
set queue gmao-test resources_max.ncpus = 32
set queue gmao-test resources_max.walltime = 18:00:00
set queue gmao-test resources_default.mem = 2004mb
set queue gmao-test resources_default.ncpus = 4
set queue gmao-test resources_default.nodect = 1
set queue gmao-test resources_default.walltime = 00:05:00
set queue gmao-test acl_group_enable = False
set queue gmao-test enabled = True
set queue gmao-test started = True

set server scheduling = True
set server acl_user_enable = False
set server managers = root at gmao-test.gsfc.nasa.gov
set server default_queue = gmao-test
set server log_events = 511
set server mail_from = pbs-gmao-test
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server resources_default.nodect = 1
set server resources_default.walltime = 00:15:00
set server resources_max.mem = 16384mb
set server resources_max.ncpus = 32
set server scheduler_iteration = 90
set server default_node = gmao-test.gsfc.nasa.gov
set server resv_enable = True
set server node_fail_requeue = 310

Maui 3.2.6

maui.conf:

SERVERHOST            gmao-test.gsfc.nasa.gov
ADMIN1                root
RMCFG[GMAO-TEST] TYPE=PBS
AMCFG[bank]  TYPE=NONE
RMPOLLINTERVAL        00:00:30
SERVERPORT            42559
SERVERMODE            NORMAL
LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              4
QUEUETIMEWEIGHT       1 
QUEUETIMEWEIGHT       1 
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEALLOCATIONPOLICY  MINRESOURCE
DEFERTIME 0

My test job is pretty simple:

#PBS -N mpitest.job
#PBS -A k3003
#PBS -W group_list=g931
#PBS -l ncpus=4
#PBS -l walltime=00:10:00
#PBS -l mem=4g
#PBS -j oe

echo "job ran"
sleep 30


What I see is that maui tries to schedule the job to run but the pbs mom
never actually starts the job.  The job is attempted to start some
number of times before it is held indefinitely (and I can't figure out a
way to get the job to restart).

However, if I stop maui and start the standard PBSpro scheduler (which
defaults to fair share, etc) the job is started and it runs fine.

The message in the pbs mom log that I think is key:

07/19/2005 23:03:10;0008;pbs_mom;Job;17.gmao-test;set jobid
0x55da000000000393
07/19/2005 23:03:10;0001;pbs_mom;Svr;pbs_mom;assign_cpuset, cannot find
"ssinode
s" resource
07/19/2005 23:03:10;0001;pbs_mom;Job;17.gmao-test;Cannot assign cpuset
to 17.gma
o-test: Does not exist 
07/19/2005 23:03:10;0008;pbs_mom;Job;17.gmao-test;job not started, Retry
-3
07/19/2005 23:03:10;0008;pbs_mom;Job;17.gmao-test;kill_job
07/19/2005 23:03:10;0100;pbs_mom;Job;17.gmao-test;Obit sent

Has anyone seen this issue before?  I am sure that it is probably
something really simple that I am missing, but looking at the pbs
integration guide I am not sure what it could be.

My eventual goal is to get pbs+maui working on some altix systems but I
would like to try and get this to work under IRIX where I am more
failure with behavior of pbs.

any help would be appreciated.

thanks
Nick

-- 
Nicko Acks                              301-286-2333 voice
NASA / Goddard Space Flight Center      nacks at nccs.nasa.gov
Computer Sciences Corporation           Building 28, Room S228
Code 606.2/High-Performance Computing


More information about the mauiusers mailing list