[Mauiusers] Partition problem

luxun luxun6 at gmail.com
Mon Apr 3 00:41:37 MDT 2006


Thanks for your help.

I try to setting node properties in $PBS_HOME/server_priv/nodes.
Then I submit some parallel jobs, some are running on parallel queue,
others are running on serial queue. Serial jobs are the same.
Accounting logs:
04/03/2006 09:35:56;E;3.i159.ascc;user=wzlu group=wzlu jobname=cpi
queue=parallel ctime=1144028142 qtime=1144028142 etime=1144028142
start=1144028142 exec_host=i153.ascc/0+i152.ascc/0 Resource_List.neednodes=2
Resource_List.nodect=2 Resource_List.nodes=2 session=0 end=1144028156
Exit_status=271 resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb resources_used.walltime=00:00:14
(This parallel job running on i153.ascc and i152.ascc. i153.ascc and
i152.ascc are
define in serial queue)

maui.log have following message:
04/03 14:14:51 MPBSNodeUpdate(i154.ascc,i154.ascc,Idle,base)
04/03 14:14:51 MPBSLoadQueueInfo(base,i154.ascc,SC)
04/03 14:14:51 INFO:     queue 'batch' started state set to True
04/03 14:14:51 INFO:     class to node not mapping enabled for queue 'batch'
adding class to all nodes
04/03 14:14:51 INFO:     queue 'serial' started state set to True
04/03 14:14:51 INFO:     class to node not mapping enabled for queue
'serial' adding class to all nodes
04/03 14:14:51 INFO:     queue 'parallel' started state set to True
04/03 14:14:51 INFO:     class to node not mapping enabled for queue
'parallel' adding class to all nodes

I try to add "#PBS -l nodes=2:parallel" in job script, all the parallel jobs
running on parallel queue.
Accounting logs:
04/03/2006 09:55:40;E;4.i159.ascc;user=wzlu group=wzlu jobname=cpi
queue=parallel ctime=1144029318 qtime=1144029318 etime=1144029318
start=1144029319 exec_host=i156.ascc/0+i155.ascc/0
Resource_List.neednodes=2:parallel Resource_List.nodect=2
Resource_List.nodes=2:parallel session=0 end=1144029340 Exit_status=0
resources_used.cput=00:00:00 resources_used.mem=616kb
resources_used.vmem=5276kb resources_used.walltime=00:00:22
(This parallel job running on i156.ascc and i155.ascc. i156.ascc and
i155.ascc are
define in parallel queue)

Add "#PBS -l nodes=2:parallel" in job script for most users are
inconvenient.
I thinks there are some miss in my system.

Have any idea? Thanks.

My environment is:
OS - RHEL 4 WS 64 bit
torque - 2.0.0p8
maui - 3.2.6p14
serial queue - i151.ascc i152.ascc i153.ascc i154.ascc
parallel queue - i155.ascc i156.ascc

torque configuration as following:
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Route
set queue batch route_destinations = serial
set queue batch route_destinations += parallel
set queue batch enabled = True
set queue batch started = True
#
# Create and define queue serial
#
create queue serial
set queue serial queue_type = Execution
set queue serial resources_max.nodect = 1
set queue serial resources_default.nodect = 1
set queue serial resources_default.nodes = 1:ppn=1
set queue serial enabled = True
set queue serial started = True
#
# Create and define queue parallel
#
create queue parallel
set queue parallel queue_type = Execution
set queue parallel resources_max.nodect = 64
set queue parallel resources_min.nodect = 2
set queue parallel resources_default.nodect = 2
set queue parallel resources_default.nodes = 2:ppn=1
set queue parallel enabled = True
set queue parallel started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server acl_user_enable = False
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.neednodes = 1
set server resources_default.nodes = 1:ppn=1
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = 1
set server pbs_version = 2.0.0p8-1cri

nodes
i151.ascc serial
i152.ascc serial
i153.ascc serial
i154.ascc serial
i155.ascc parallel
i156.ascc parallel
i157.ascc parallel

maui.cfg
# maui.cfg 3.2.6p14

SERVERHOST            i159.ascc
# primary admin must be first in list
ADMIN1                root

# Resource Manager Definition

RMCFG[base] TYPE=PBS

# Allocation Manager Definition

#AMCFG[bank]  TYPE=NONE

# full parameter docs at
http://clusterresources.com/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration

RMPOLLINTERVAL        00:00:30

SERVERPORT            42559
SERVERMODE            NORMAL

# Admin: http://clusterresources.com/mauidocs/a.esecurity.html

LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

# Job Priority:
http://clusterresources.com/mauidocs/5.1jobprioritization.html

QUEUETIMEWEIGHT       1

# FairShare: http://clusterresources.com/mauidocs/6.3fairshare.html

#FSPOLICY              PSDEDICATED
#FSDEPTH               7
#FSINTERVAL            86400
#FSDECAY               0.80

# Throttling Policies:
http://clusterresources.com/mauidocs/6.2throttlingpolicies.html

# NONE SPECIFIED

# Backfill: http://clusterresources.com/mauidocs/8.2backfill.html

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

#NODEALLOCATIONPOLICY  MINRESOURCE
NODEALLOCATIONPOLICY  CPULOAD

DEFERTIME             0

NODECFG[i151.ascc] PARTITION=SERIAL
NODECFG[i152.ascc] PARTITION=SERIAL
NODECFG[i153.ascc] PARTITION=SERIAL
NODECFG[i154.ascc] PARTITION=SERIAL
NODECFG[i155.ascc] PARTITION=PARALLEL
NODECFG[i156.ascc] PARTITION=PARALLEL

CLASSCFG[serial]     MAXJOBPERUSER=4
CLASSCFG[parallel]   MAXJOBPERUSER=4
CLASSCFG[parallel]   MAXPROCPERUSER=16
USERCFG[DEFAULT]     MAXJOB=6 MAXPROC=20

SRPARTITION[serial]  SERIAL
SRTASKCOUNT[serial]  4
SRRESOURCES[serial]  PROCS=-1
SRCLASSLIST[serial]  serial
SRPERIOD[serial]     INFINITY

SRPARTITION[parallel]  PARALLEL
SRTASKCOUNT[parallel]  2
SRRESOURCES[parallel]  PROCS=-1
SRCLASSLIST[parallel]  parallel
SRPERIOD[parallel]     INFINITY


2006/3/31, Bas van der Vlies < basv at sara.nl>:
>
>
> I do not use PARTITIONS but i solved the problem by setting node
> properties for, eg:
> node1 serial
> node2 serial
> node3 parallel
> node4 parallel
>
> In torque to create queue:
>    parallel...
>    set queue q_parallel resources_default.neednodes = parallel
>    set queue q_parallel resources_default.nodect = 2
>    ...
>
>    serial...
>    set queue q_serial resources_default.neednodes = serial
>    set queue q_serial resources_max.nodect = 1
>    set queue q_serial resources_default.ncpus = 1
>    set queue q_serial resources_default.nodect = 1
>    set queue q_serial resources_default.nodes = 1
>
> --
> ********************************************************************
> *                                                                  *
> *  Bas van der Vlies                     e-mail: basv at sara.nl      *
> *  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
> *  Kruislaan 415                         fax:    +31 20 6683167    *
> *  1098 SJ Amsterdam                                               *
> *                                                                  *
> ********************************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20060403/bf92294b/attachment.html


More information about the mauiusers mailing list