[torqueusers] qsub: Job rejected by all possible destinations

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Tue Jan 20 14:59:46 MST 2009


Hi Weiguang Chen,

I can't see a direct link, but I'd drop all reference to ncpus in both the queue definitions and the job requirements.  Torque primarily uses the -l nodes=N:ppn=M syntax to request N lots of M cpus, or MxN cpus. I think ncpus is a poorly supported legacy option primarily aimed at SMP systems rather than distributed clusters where the nodes/ppn syntax is a better match.

Gareth Williams
CSIRO IM&T - ASC
http://intra.hpsc.csiro.au


> -----Original Message-----
> From: Weiguang Chen [mailto:chenweiguang82 at gmail.com]
> Sent: Tuesday, 20 January 2009 12:02 PM
> To: Steve Young
> Cc: torqueusers maillist
> Subject: Re: [torqueusers] qsub: Job rejected by all possible destinations
>
> Hi,
> Thank you for your fast reply.
> In fact, initially, there was not that tag "###PBS -q huge" in my
> submission script. In the beginning, i expected the job would been
> routed automatically form route queue (like default, which is the
> default queue) to the suitable execution queue (like huge). When job
> submission failed, so i added that tag for testing whether job would
> been directly transfered to huge queue, but following message showed:
>
> qsub: Job exceeds queue resource limits MSG=cannot satisfy queue min
> nodes requirement
>
> So, i commented it again.
> I was very confused that message, because i requested 16 nodes, and
> the min nodes was set as 8 (in fact, it should be 9. There are 2 cpus
> in our every node, and i set the min ncpus as 17) and the max nodes
> was set to 16 (the max ncpus is 32) of huge queue . I thought huge
> queue shoule be suitable for my job.
> What needs to explain, all queues are incompatible expect for some
> special queues. Following is the whole settings by command ' qmgr -c
> "p s" ' ( For some reasons ,i removed some information about
> acl_users), i hoped it would be helpful to understanding my problem.
>
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue huge
> #
> create queue huge
> set queue huge queue_type = Execution
> set queue huge Priority = 40
> set queue huge max_queuable = 2
> set queue huge max_user_queuable = 1
> set queue huge max_running = 1
> set queue huge acl_user_enable = True
> set queue huge acl_users = xxx at node1
> set queue huge resources_max.ncpus = 32
> set queue huge resources_max.nodect = 16
> set queue huge resources_max.nodes = 16
> set queue huge resources_max.walltime = 160:00:00
> set queue huge resources_min.ncpus = 17
> set queue huge resources_min.nodect = 8
> set queue huge resources_min.nodes = 8
> set queue huge resources_min.walltime = 00:00:01
> set queue huge resources_default.walltime = 36:00:00
> set queue huge max_user_run = 1
> set queue huge enabled = True
> set queue huge started = True
> #
> # Create and define queue default
> #
> create queue default
> set queue default queue_type = Route
> set queue default max_running = 15
> set queue default route_destinations = tiny
> set queue default route_destinations += verysmall
> set queue default route_destinations += small
> set queue default route_destinations += medium
> set queue default route_destinations += huge
> set queue default route_destinations += train
> set queue default route_destinations += special
> set queue default enabled = True
> set queue default started = True
> #
> # Create and define queue verysmall
> #
> create queue verysmall
> set queue verysmall queue_type = Execution
> set queue verysmall Priority = 120
> set queue verysmall max_queuable = 9
> set queue verysmall max_user_queuable = 3
> set queue verysmall max_running = 7
> set queue verysmall acl_user_enable = True
> set queue verysmall acl_users = xxx at node1
> set queue verysmall resources_max.ncpus = 4
> set queue verysmall resources_max.nodect = 2
> set queue verysmall resources_max.nodes = 2
> set queue verysmall resources_max.walltime = 36:00:00
> set queue verysmall resources_min.ncpus = 3
> set queue verysmall resources_min.nodect = 2
> set queue verysmall resources_min.nodes = 2
> set queue verysmall resources_min.walltime = 00:00:01
> set queue verysmall resources_default.walltime = 24:00:00
> set queue verysmall max_user_run = 2
> set queue verysmall enabled = True
> set queue verysmall started = True
> #
> # Create and define queue tiny
> #
> create queue tiny
> set queue tiny queue_type = Execution
> set queue tiny Priority = 140
> set queue tiny max_queuable = 13
> set queue tiny max_user_queuable = 3
> set queue tiny max_running = 10
> set queue tiny acl_user_enable = True
> set queue tiny acl_users = xxx at node1
> set queue tiny resources_max.ncpus = 2
> set queue tiny resources_max.nodect = 1
> set queue tiny resources_max.nodes = 1
> set queue tiny resources_max.walltime = 36:00:00
> set queue tiny resources_min.ncpus = 1
> set queue tiny resources_min.nodect = 1
> set queue tiny resources_min.nodes = 1
> set queue tiny resources_min.walltime = 00:00:01
> set queue tiny resources_default.walltime = 24:00:00
> set queue tiny max_user_run = 2
> set queue tiny enabled = True
> set queue tiny started = True
> #
> # Create and define queue medium
> #
> create queue medium
> set queue medium queue_type = Execution
> set queue medium Priority = 80
> set queue medium max_queuable = 5
> set queue medium max_user_queuable = 2
> set queue medium max_running = 3
> set queue medium acl_user_enable = True
> set queue medium acl_users = xxx at node1
> set queue medium resources_max.ncpus = 16
> set queue medium resources_max.nodect = 8
> set queue medium resources_max.nodes = 8
> set queue medium resources_max.walltime = 168:00:00
> set queue medium resources_min.ncpus = 9
> set queue medium resources_min.nodect = 5
> set queue medium resources_min.nodes = 5
> set queue medium resources_min.walltime = 00:00:01
> set queue medium resources_default.walltime = 24:00:00
> set queue medium max_user_run = 1
> set queue medium enabled = True
> set queue medium started = True
> #
> # Create and define queue train
> #
> create queue train
> set queue train queue_type = Execution
> set queue train Priority = 160
> set queue train max_queuable = 3
> set queue train max_user_queuable = 3
> set queue train max_running = 2
> set queue train acl_user_enable = True
> set queue train acl_users = phy01 at node1
> set queue train resources_max.ncpus = 2
> set queue train resources_max.nodect = 1
> set queue train resources_max.nodes = 1
> set queue train resources_max.walltime = 36:00:00
> set queue train resources_min.ncpus = 1
> set queue train resources_min.nodect = 1
> set queue train resources_min.nodes = 1
> set queue train resources_min.walltime = 00:00:01
> set queue train resources_default.walltime = 24:00:00
> set queue train max_user_run = 2
> set queue train enabled = True
> set queue train started = True
> #
> # Create and define queue small
> #
> create queue small
> set queue small queue_type = Execution
> set queue small Priority = 100
> set queue small max_queuable = 7
> set queue small max_user_queuable = 3
> set queue small max_running = 5
> set queue small acl_user_enable = True
> set queue small acl_users = xxx at node1
> set queue small resources_max.ncpus = 8
> set queue small resources_max.nodect = 4
> set queue small resources_max.nodes = 4
> set queue small resources_max.walltime = 36:00:00
> set queue small resources_min.ncpus = 5
> set queue small resources_min.nodect = 3
> set queue small resources_min.nodes = 3
> set queue small resources_min.walltime = 00:00:01
> set queue small resources_default.walltime = 24:00:00
> set queue small max_user_run = 2
> set queue small enabled = True
> set queue small started = True
> #
> # Create and define queue special
> #
> create queue special
> set queue special queue_type = Execution
> set queue special Priority = 130
> set queue special max_queuable = 3
> set queue special max_user_queuable = 3
> set queue special max_running = 2
> set queue special acl_user_enable = True
> set queue special acl_users = qxli at node1
> set queue special resources_max.ncpus = 2
> set queue special resources_max.nodect = 1
> set queue special resources_max.nodes = 1
> set queue special resources_max.walltime = 96:00:00
> set queue special resources_min.ncpus = 1
> set queue special resources_min.nodect = 1
> set queue special resources_min.nodes = 1
> set queue special resources_min.walltime = 00:00:01
> set queue special resources_default.walltime = 48:00:00
> set queue special max_user_run = 2
> set queue special enabled = True
> set queue special started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server max_user_run = 10
> set server acl_hosts = node1
> set server default_queue = default
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server next_job_number = 2423
>
> Thank you
> Sincerely
>
> Chen Weiguang
>
> On Mon, Jan 19, 2009 at 8:32 PM, Steve Young <chemadm at hamilton.edu> wrote:
> > Hi,
> >        Ok I understand better .. you are using a routing queue =). In
> your
> > first e-mail did you un-comment the ###PBS -q huge  and see if that
> worked?
> > Since it is commented out your going to the "default" routing queue
> since no
> > queue is specified. For some reason, it's thinking there isn't any place
> to
> > route it too. So I'd try making sure the huge queue works like you
> expect
> > first then try using the routing queue. Hope this helps,
> >
> > -Steve
> >
> > On Jan 19, 2009, at 12:12 PM, Weiguang Chen wrote:
> >
> >> Hi,
> >> Thank you very much for your reply.
> >> What i was confused is the settings about huge basically is similar to
> >> the other queues, such as below:
> >>
> >> set queue default route_destinations += medium
> >> # Create and define queue medium
> >> create queue medium
> >> set queue medium queue_type = Execution
> >> set queue medium Priority = 80
> >> set queue medium max_queuable = 5
> >> set queue medium max_user_queuable = 2
> >> set queue medium max_running = 3
> >> set queue medium acl_user_enable = True
> >> set queue medium acl_users = xxx at node1
> >> set queue medium resources_max.ncpus = 16
> >> set queue medium resources_max.nodect = 8
> >> set queue medium resources_max.nodes = 8
> >> set queue medium resources_max.walltime = 168:00:00
> >> set queue medium resources_min.ncpus = 9
> >> set queue medium resources_min.nodect = 5
> >> set queue medium resources_min.nodes = 5
> >> set queue medium resources_min.walltime = 00:00:01
> >> set queue medium resources_default.walltime = 24:00:00
> >> set queue medium max_user_run = 1
> >> set queue medium enabled = True
> >> set queue medium started = True
> >>
> >> But this queue works well. The other settings i set are used to route
> >> different kinds of job to the appropriate queue.
> >> According to the script of submitted job, i thought it conform the
> >> policy of huge queue.
> >> Now, the job can been submitted to the default queue, but can not been
> >> routed to the huge queue. below is the settings about default queue (
> >> if queue isn't given by the users, jobs will be routed to default
> >> queue):
> >> create queue default
> >> set queue default queue_type = Route
> >> set queue default max_running = 15
> >> set queue default route_destinations = tiny
> >> set queue default route_destinations += verysmall
> >> set queue default route_destinations += small
> >> set queue default route_destinations += medium
> >> set queue default route_destinations += huge
> >> set queue default route_destinations += train
> >> set queue default route_destinations += special
> >> set queue default enabled = True
> >> set queue default started = True
> >> create queue default
> >> set queue default queue_type = Route
> >> set queue default max_running = 15
> >> set queue default route_destinations = tiny
> >> set queue default route_destinations += verysmall
> >> set queue default route_destinations += small
> >> set queue default route_destinations += medium
> >> set queue default route_destinations += huge
> >> set queue default route_destinations += train
> >> set queue default route_destinations += special
> >> set queue default enabled = True
> >> set queue default started = True
> >> set server default_queue = default
> >>
> >> Happy Spring Festival (Chinese New Year, 牛年)
> >>
> >> ChenWeiguang
> >>
> >> On Mon, Jan 19, 2009 at 6:14 PM, Steve Young <chemadm at hamilton.edu>
> wrote:
> >>>
> >>> Hi,
> >>>      I'm guessing that this line is messing you up:
> >>>
> >>>> set queue default route_destinations += huge
> >>>
> >>> The queue you have defined "huge" is not a routing queue it is an
> >>> execution
> >>> queue. I'd remove that. I might also remove a bunch of the other
> settings
> >>> you have to start out with the basic's then add in the ones you want
> one
> >>> at
> >>> a time so you can test to make sure they work. Hope this helps,
> >>>
> >>> -Steve
> >>>
> >>>
> >>>
> >>> On Jan 17, 2009, at 10:11 AM, Weiguang Chen wrote:
> >>>
> >>>> Hi,
> >>>> I noticed this question was asked and the URL is
> >>>>
> >>>>
> >>>> http://www.clusterresources.com/pipermail/torqueusers/2008-
> January/006698.html
> >>>> But my trouble is difference from that. I want to submit a huge job:
> >>>> #!/bin/bash
> >>>> #PBS -N N-top
> >>>> ###PBS -q huge
> >>>> #PBS -o N-top.out
> >>>> #PBS -e N-top.err
> >>>> #PBS -l nodes=16:ppn=2,walltime=160:00:00
> >>>>
> >>>> and the queue huge is set by following:
> >>>> # Create and define queue huge
> >>>> create queue huge
> >>>> set queue huge queue_type = Execution
> >>>> set queue huge Priority = 40
> >>>> set queue huge max_queuable = 2
> >>>> set queue huge max_user_queuable = 1
> >>>> set queue huge max_running = 1
> >>>> set queue huge acl_user_enable = True
> >>>> set queue huge acl_users = xxx at node1
> >>>> set queue huge resources_max.ncpus = 32
> >>>> set queue huge resources_max.nodect = 16
> >>>> set queue huge resources_max.nodes = 16
> >>>> set queue huge resources_max.walltime = 160:00:00
> >>>> set queue huge resources_min.ncpus = 17
> >>>> set queue huge resources_min.nodect = 8
> >>>> set queue huge resources_min.nodes = 8
> >>>> set queue huge resources_min.walltime = 00:00:01
> >>>> set queue huge resources_default.walltime = 36:00:00
> >>>> set queue huge max_user_run = 1
> >>>> set queue huge enabled = True
> >>>> set queue huge started = True
> >>>> set queue default route_destinations += huge
> >>>>
> >>>> The message showed as the title while i submitted it. I checked the
> log:
> >>>> 01/17/2009 22:40:39;0100;PBS_Server;Job;2389.node1;enqueuing into
> >>>> default, state 1 hop 1
> >>>> 01/17/2009 22:40:39;0008;PBS_Server;Job;2389.node1;Job rejected by
> all
> >>>> possible destinations
> >>>> 01/17/2009 22:40:39;0100;PBS_Server;Job;2389.node1;dequeuing from
> >>>> default, state QUEUED
> >>>> 01/17/2009 22:40:39;0080;PBS_Server;Req;req_reject;Reject reply
> >>>> code=15039(Job rejected by all possible destinations), aux=0,
> >>>> type=Commit, from xxx at node1
> >>>> 01/17/2009 22:40:39;0040;PBS_Server;Svr;node1;Scheduler sent command
> >>>> term
> >>>>
> >>>> It confused me very much.
> >>>> --
> >>>> Best Wishes
> >>>> ChenWeiguang
> >>>>
> >>>> ************************************************
> >>>> #               Chen, Weiguang
> >>>> #
> >>>> #    Postgraduate,  Ph. D
> >>>> #  75 University Road, Physics Buliding  #  218
> >>>> #  School of Physics & Engineering
> >>>> #  Zhengzhou University
> >>>> #  Zhengzhou, Henan 450052  CHINA
> >>>> #
> >>>> #  Tel: 86-13203730117;
> >>>> #  E-mail:chenweiguang82 at gmail.com;
> >>>> #            chenweiguang82 at qq.com
> >>>> #**********************************************
> >>>> _______________________________________________
> >>>> torqueusers mailing list
> >>>> torqueusers at supercluster.org
> >>>> http://www.supercluster.org/mailman/listinfo/torqueusers
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Best Wishes
> >> ChenWeiguang
> >>
> >> ************************************************
> >> #               Chen, Weiguang
> >> #
> >> #    Postgraduate,  Ph. D
> >> #  75 University Road, Physics Buliding  #  218
> >> #  School of Physics & Engineering
> >> #  Zhengzhou University
> >> #  Zhengzhou, Henan 450052  CHINA
> >> #
> >> #  Tel: 86-13203730117;
> >> #  E-mail:chenweiguang82 at gmail.com;
> >> #            chenweiguang82 at qq.com
> >> #**********************************************
> >
> >
>
>
>
> --
> Best Wishes
> ChenWeiguang
>
> ************************************************
> #               Chen, Weiguang
> #
> #    Postgraduate,  Ph. D
> #  75 University Road, Physics Buliding  #  218
> #  School of Physics & Engineering
> #  Zhengzhou University
> #  Zhengzhou, Henan 450052  CHINA
> #
> #  Tel: 86-13203730117;
> #  E-mail:chenweiguang82 at gmail.com;
> #            chenweiguang82 at qq.com
> #**********************************************



More information about the torqueusers mailing list