[torqueusers] Submission number limits?

Jeremy Mann jeremy at biochem.uthscsa.edu
Thu May 8 12:19:09 MDT 2008


I spoke too soon, now the scheduler is dying with general protection
faults when jobs start to run. I thought everything was ok, because it was
queuing and routing to the execq.

pbs_sched[31850] general protection rip:404de5 rsp:7fbffff980 error:0
pbs_sched[9692] general protection rip:404de5 rsp:7fbfffe8a0 error:0
pbs_sched[15210] general protection rip:404de5 rsp:7fbfffe880 error:0

What can I do to fix this?


Garrick Staples wrote:
>
>> >  > The jobs are quite small and they run for about a minute. Now we're
>> >  > thinking about breaking them up into 100 or 1000 job chunks.
>> >  >
>> >  > I'm curious if the number of jobs being submitted, in our case
>> 140,000, is
>> >  > too large for PBS/Torque to handle.
>> >  >
>> >  > Torque 2.1.2 x86_64 and the built in scheduler (not MAUI)
>> >
>> >  The trick is to limit the number of jobs visible to the scheduler by
>> using a
>> >  routing queue to spool jobs into the execution queue.
>> >
>> >  So you do something like this:
>> >
>> >  create queue spoolq queue_type = Route, route_destinations = execq
>> >  create queue execq  queue_type = E, max_queueable=1000
>> >
>>
>> Would at a MAUI level
>>
>> USERCFG[DEFAULT]    MAXIJOB=100
>>
>> do the same thing and allow other users a look in while big user is
>> having his
>> submitted in batches of 100.
>
> No for 2 reasons: he's not using maui, and that doesn't reduce the number
> of
> jobs visible to the scheduler.  The problem is that it takes too long to
> transfer the job data for several 10s of thousands of jobs.
>
> --
> Garrick Staples, GNU/Linux HPCC SysAdmin
> University of Southern California
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672



More information about the torqueusers mailing list