[torqueusers] Re: Scalability issues with pbs_sched_cc

Eric D. Blom ebx at cypress.com
Mon Nov 29 11:01:33 MST 2004


On Nov 29, 2004, at 9:09 AM, Dwight Kelly wrote:
>> We have experienced problems when submitting large numbers of small
>> jobs to our system. We have about 35 nodes and when we submit say
>> 10,000 jobs that average 5 minutes each the system struggles to keep
>> all nodes busy. I haven't had time to investigate though.
>
> We have also noticed this behavior with the FIFO scheduler. I adjusted 
> the "scheduler_iteration" variable from the default of 600 to 200 and 
> got some improvement. I can also force the scheduler to run jobs by 
> submitting a new job.

I should clarify that we are also using the default FIFO scheduler.


> It appears that the scheduler will iterate over the queued jobs trying 
> to submit them. After a certain number of passes it gives up and waits 
> some amount of time before trying to schedule new jobs. However, if a 
> new job is submitted it immediately tries to schedule pending jobs. 
> This behavior is most apparent if you have a lot of short-runtime jobs 
> queued.

What you describe is almost exactly what my experience has been as 
well. Sometimes I try to kick start the schedule by using the qrun 
command.

Eric


> ---
> Dwight Kelly
> Apago, Inc.  4080 McGinnis Ferry Rd  Suite 601 Alpharetta, GA 30005
> voice:(770) 619-1884  fax:(770) 619-1885
> email: dkelly at apago.com web: http://www.apago.com
>
> PDF Enhancer 2.6 - Assemble, optimize, shrink, repurpose, secure, 
> stamp and impose PDF files. Available for Windows and Mac OS X. 
> http://www.apago.com/pdfenhancer
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
>



More information about the torqueusers mailing list