[torqueusers] torque launchs more jobs than number ofvirtualproc per node

Sam Rash srash at yahoo-inc.com
Fri Jun 16 13:37:57 MDT 2006


I don't know if this applies here, but I found when I wanted my job to
properly inherit a ppn value (which, if I set a host to say 8, and wanted
each job from a queue to get ppn=2), I needed to set both nodes and
neednodes...

set queue batch resources_default.nodes=1:ppn=2
set queue batch resources_default.neednodes=1:ppn=2

...
set node <hostname> np=8

would make it 4.  (of course you can use 1/4 unless you have other queues
where you want to use ppn=3, etc)

Using just the nodes resulted in a single host getting an unlimited # of
jobs...

Hope this helps,

Regards,
Sam Rash
srash at yahoo-inc.com
408-349-7312
vertigosr37

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Adrian Wu
Sent: Friday, June 16, 2006 12:28 PM
To: jscoggins at lbl.gov
Cc: torqueusers at supercluster.org
Subject: RE: [torqueusers] torque launchs more jobs than number
ofvirtualproc per node

Hi Jackie,

here is my qmgr -c 'p s':

#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch max_running = 80
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server operators = root at mumag2.com
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server resources_default.nodes = 1
set server scheduler_iteration = 60
set server node_check_rate = 150
set server tcp_timeout = 6
set server node_pack = False
set server pbs_version = 2.1.0p0

my setup is simple, as you can see: one queue, 20 nodes with 4 VP per node.
All I want is to have no more than 4 jobs launched per node at any given
time.

I just reset my database, and applied the above settings; it does not seem
to help or change the behavior. 

How did you fix the problem that you feel is similar to mine?

thanks!
adrian

-----Original Message-----
From: Jacqueline Scoggins [mailto:jscoggins at lbl.gov]
Sent: Friday, June 16, 2006 11:04 AM
To: Adrian Wu
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] torque launchs more jobs than number of
virtualproc per node


What does your qmgr output look like:  provide the following:

qmgr -c 'p s' and then we can determine why it goes like this.

I have a system with both dualcore and dual process systems and I have
my nodes file similar except I created 2 classes - shared and dualcore
so the users would have to specify which type of nodes to run on.  But I
found that the parameters in the database for the scheduler was causing
me problems similar to this.  So send that information and maybe the
answer will pop out.

Jackie

On Thu, 2006-06-15 at 08:36, Adrian Wu wrote:
> Hi all,
> 
> I have installed torque 2.1.0p0 on 20 dual socket dual-core nodes, and
using pbs_sched. in my nodes files i have specified:
> 
> node1 np=4
> node2 np=4
> .
> .
> node20 np=4
> 
> All my jobs are single process jobs that needs to run on one core/virtual
processor, and tend to finish about the same time. I can't get torque to
stop launching just 4 jobs per node. If my queue is not full, this seems to
work; but if I have, say, 300 jobs in the queue, with majority of the jobs
queued up behind the first "wave" of jobs, some of the jobs from the 2nd
"wave" would launch as many as 8 jobs on a single node, therefore
substantially slowing down all the jobs on this node. When I try to set
$max_load in the mom_priv/config (tried to set at 3.5), the nodes gets the
job-exclusive,busy state, but would still continue to take on jobs. It seems
like, once there are jobs queued up, torque no longer check each node's
state before launching more jobs to it...
> 
> I've read posts similar (not exactly same behavior) to this, and a
recompile of torque without optimization helped. I just ran ./configure and
make - where should I take out the optimization?
> 
> Would using the maui scheduler (instead of pbs_sched) help?
> 
> any suggestion from the list would be helpful. thanks in advance!
> 
> adrian
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers




More information about the torqueusers mailing list