[torqueusers] HELP: scheduling out of order

Garrick Staples garrick at usc.edu
Wed Oct 17 14:23:39 MDT 2007

On Wed, Oct 17, 2007 at 01:10:44PM -0700, scoggins alleged:
> On Oct 17, 2007, at 1:02 PM, Garrick Staples wrote:
> >On Wed, Oct 17, 2007 at 10:33:34AM -0700, scoggins alleged:
> >>I have a user who is submitting several jobs - one at a time.  The
> >>starttimes are showing up with 99:07:30:06 using
> >>the showstart command.  Some of the other jobs are running but they
> >>were submitted afterwards.
> >
> >You've got 2 things going on here.  The first is that the weird  
> >starttime is
> >because users aren't requesting a walltime and/or your torque  
> >config doesn't
> >have a default or max walltime, this causes Maui to assume a  
> >maximum possible
> >walltime.  Each job will then appear to reserve resources for 100  
> >days or until
> >they exit (whihever comes first).
> >
> >The second thing is that jobs aren't starting the way you expect.   
> >See if
> >'checkjob' gives a reason.
> >
> Checked that and it does not.
> I was reading someway that it might have something to do with  
> fairshare.  Jobs won't start because
> depending on when they were checked the priority was changed.
> "The most likely reason for this to happen is that your priority is  
> lower than other users based on the 7 day historical usage of our  
> fairshare policy.  To find out when a job is predicted to run, use  
> the showstart <jobID> command where <jobID> is the job ID number for  
> your job."
> That is why I ran showstart and saw the weird time.  The users do not  
> want time limits set and they don't want to have to specify walltime  
> on their jobs.  What could I do then if it is the same user competing  
> for the resource.

Is the cluster fully in use?  And the one greedy user isn't getting his jobs to
run?  Sounds like fairshare is working correctly.  Of course, you can always
adjust the fairshare algo, or just disable fairshare.

If you don't want walltimes, then I think there is little that Maui can do for
you.  The main source of improved utilization is backfill, which will never
work without max walltimes.

Might as well just use pbs_sched fifo?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20071017/cbb30157/attachment.bin

More information about the torqueusers mailing list