[torqueusers] Re: Scalability issues with pbs_sched_cc
Eric D. Blom
ebx at cypress.com
Mon Nov 29 11:01:33 MST 2004
On Nov 29, 2004, at 9:09 AM, Dwight Kelly wrote:
>> We have experienced problems when submitting large numbers of small
>> jobs to our system. We have about 35 nodes and when we submit say
>> 10,000 jobs that average 5 minutes each the system struggles to keep
>> all nodes busy. I haven't had time to investigate though.
> We have also noticed this behavior with the FIFO scheduler. I adjusted
> the "scheduler_iteration" variable from the default of 600 to 200 and
> got some improvement. I can also force the scheduler to run jobs by
> submitting a new job.
I should clarify that we are also using the default FIFO scheduler.
> It appears that the scheduler will iterate over the queued jobs trying
> to submit them. After a certain number of passes it gives up and waits
> some amount of time before trying to schedule new jobs. However, if a
> new job is submitted it immediately tries to schedule pending jobs.
> This behavior is most apparent if you have a lot of short-runtime jobs
What you describe is almost exactly what my experience has been as
well. Sometimes I try to kick start the schedule by using the qrun
> Dwight Kelly
> Apago, Inc. 4080 McGinnis Ferry Rd Suite 601 Alpharetta, GA 30005
> voice:(770) 619-1884 fax:(770) 619-1885
> email: dkelly at apago.com web: http://www.apago.com
> PDF Enhancer 2.6 - Assemble, optimize, shrink, repurpose, secure,
> stamp and impose PDF files. Available for Windows and Mac OS X.
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers