[torqueusers] Submission number limits?
garrick at usc.edu
Wed May 7 17:32:23 MDT 2008
On Wed, May 07, 2008 at 10:29:57PM +0200, Steve Traylen alleged:
> 2008/5/7 Garrick Staples <garrick at usc.edu>:
> > On Wed, May 07, 2008 at 02:53:04PM -0500, Jeremy Mann alleged:
> > > Good afternoon all, I have one user that wants to submit roughly 140,000
> > > jobs to our queue. We tried it last week and it never worked. It took
> > > nearly an hour to submit all of them, then the PBS scheduler would stop
> > > responding and give:
> > >
> > > 05/02/2008 14:39:50;0100; pbs_sched;Req;;Leaving schedule
> > >
> > > 05/02/2008 14:39:50;0080; pbs_sched;Svr;main;brk point 760373248
> > > 05/02/2008 14:39:53;0100; pbs_sched;Req;;Entering Schedule
> > > 05/02/2008 14:42:53;0002; pbs_sched;Svr;toolong;alarm call
> > >
> > > The jobs are quite small and they run for about a minute. Now we're
> > > thinking about breaking them up into 100 or 1000 job chunks.
> > >
> > > I'm curious if the number of jobs being submitted, in our case 140,000, is
> > > too large for PBS/Torque to handle.
> > >
> > > Torque 2.1.2 x86_64 and the built in scheduler (not MAUI)
> > The trick is to limit the number of jobs visible to the scheduler by using a
> > routing queue to spool jobs into the execution queue.
> > So you do something like this:
> > create queue spoolq queue_type = Route, route_destinations = execq
> > create queue execq queue_type = E, max_queueable=1000
> Would at a MAUI level
> USERCFG[DEFAULT] MAXIJOB=100
> do the same thing and allow other users a look in while big user is having his
> submitted in batches of 100.
No for 2 reasons: he's not using maui, and that doesn't reduce the number of
jobs visible to the scheduler. The problem is that it takes too long to
transfer the job data for several 10s of thousands of jobs.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080507/36176dc6/attachment.bin
More information about the torqueusers