[torqueusers] Scalability issues with pbs_sched_cc
Eric D. Blom
ebx at cypress.com
Sat Nov 27 15:46:47 MST 2004
We have experienced problems when submitting large numbers of small
jobs to our system. We have about 35 nodes and when we submit say
10,000 jobs that average 5 minutes each the system struggles to keep
all nodes busy. I haven't had time to investigate though.
On Nov 27, 2004, at 5:42 AM, Ronny T. Lampert wrote:
> I noticed the pbs_sched quitting again the 10th time today because of
> "too long" *).
> I even set the delay via "-a 400" and now 800 to try if this helps
> (does not).
> The pbs_server was instructed to run the scheduler each 480s, now 900s.
> (The server/sched is on a node for the queue)
> The queue currently holds around 760 jobs.
> When tracing the pbs_sched via strace, I noticed, that does the
> following cycle:
> select() -> read() -> write()
> and it seems it does it for one job at a time; the timespan is around
> 1s /
> cycle (which means, we have >= 700 seconds for 700 jobs, right?)
> Could we remedy the problem by bursting a set(100, even 500 or more) of
> job-descriptions, then the scheduler sorting it (this shouldn't really
> long) and then bursting the job-set back to the server?
> Does anybody else have these problems?
> If you need more info, I will happily supply it.
> Kind regards,
> *) because I have setup a privat dir (/usr/local/torque-1.1.0), where
> the whole installation is isolated, the scheduler couldn't restart
> itself and so I really noticed.
> It resets the environment (also $PATH) to the contents of
> The pbs_server and _sched are started via
> PATH="...:/usr/local/torque-1.1.0/sbin" pbs_sched
> and as such can't execv argv, because it is not the full path.
> This is not a problem, as I patched it in 5 minutes.
> torqueusers mailing list
> torqueusers at supercluster.org
Eric D. Blom Toll Free: 800-669-0557
Senior Staff Design Engineer Tel: 425-787-4825
Cypress MicroSystems Fax: 425-787-4641
ebx at cypress.com www.cypressmicro.com
More information about the torqueusers