[torqueusers] Scalability issues with pbs_sched_cc

Eric D. Blom ebx at cypress.com
Sat Nov 27 15:46:47 MST 2004

We have experienced problems when submitting large numbers of small 
jobs to our system. We have about 35 nodes and when we submit say 
10,000 jobs that average 5 minutes each the system struggles to keep 
all nodes busy. I haven't had time to investigate though.


On Nov 27, 2004, at 5:42 AM, Ronny T. Lampert wrote:

> Hi,
> I noticed the pbs_sched quitting again the 10th time today because of 
> "too long" *).
> I even set the delay via "-a 400" and now 800 to try if this helps 
> (does not).
> The pbs_server was instructed to run the scheduler each 480s, now 900s.
> (The server/sched is on a node for the queue)
> The queue currently holds around 760 jobs.
> When tracing the pbs_sched via strace, I noticed, that does the 
> following cycle:
> select() -> read() -> write()
> and it seems it does it for one job at a time; the timespan is around 
> 1s /
> cycle (which means, we have >= 700 seconds for 700 jobs, right?)
> Could we remedy the problem by bursting a set(100, even 500 or more) of
> job-descriptions, then the scheduler sorting it (this shouldn't really 
> take
> long) and then bursting the job-set back to the server?
> Does anybody else have these problems?
> If you need more info, I will happily supply it.
> Kind regards,
> Ronny
> *) because I have setup a privat dir (/usr/local/torque-1.1.0), where 
> the whole installation is isolated, the scheduler couldn't restart 
> itself and so I really noticed.
> It resets the environment (also $PATH) to the contents of 
> "pbs_environment".
> The pbs_server and _sched are started via
> PATH="...:/usr/local/torque-1.1.0/sbin" pbs_sched
> and as such can't execv argv[0], because it is not the full path.
> This is not a problem, as I patched it in 5 minutes.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
Eric D. Blom            Toll Free: 800-669-0557
Senior Staff Design Engineer  Tel: 425-787-4825
Cypress MicroSystems          Fax: 425-787-4641
ebx at cypress.com            www.cypressmicro.com

More information about the torqueusers mailing list