[torqueusers] Odd job reject problem
Tim Miller
btmiller at helix.nih.gov
Fri Dec 29 09:40:38 MST 2006
Hi Everyone,
I'm running Torque 2.1.4. I would like all of the nodes and desktop
computers on our internal network to be able to submit jobs, but only
some of them are able to and I'm not seeing why.
My setup is simple; a single routing queue that feeds into a single
execution queue. The queues are configured as follows:
routing:
Queue entry
queue_type = Route
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
Exiting:0
acl_host_enable = False
resources_default.nodes = 1:xeon306
mtime = Fri Dec 29 11:19:27 2006
route_destinations = xeon
enabled = True
started = True
exec:
Queue xeon
queue_type = Execution
total_jobs = 42
state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:41
Exiting:0
acl_host_enable = False
from_route_only = True
mtime = Fri Dec 29 11:19:21 2006
resources_assigned.nodect = 58
enabled = True
started = True
Server setup:
Server <name removed by me>
server_state = Active
scheduling = True
total_jobs = 50
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:50
Exiting:0
managers = <manager list removed>
default_queue = entry
log_events = 511
mail_from = adm
query_other_jobs = True
resources_assigned.nodect = 67
scheduler_iteration = 600
node_check_rate = 120
tcp_timeout = 6
pbs_version = 2.1.4
As you can see, I've explicit set acl_host_enable to false on both
queues. Nonetheless, when I try to submit a job from certain hosts I get
a "job rejected by all possible destinations" and the following in the
server log:
12/29/2006 11:20:22;0100;PBS_Server;Req;;Type AuthenticateUser request
received from tim at m3.lobos.nih.gov, sock=10
12/29/2006 11:20:22;0100;PBS_Server;Req;;Type QueueJob request received
from tim at m3.lobos.nih.gov, sock=9
12/29/2006 11:20:22;0100;PBS_Server;Req;;Type ReadyToCommit request
received from tim at m3.lobos.nih.gov, sock=9
12/29/2006 11:20:22;0100;PBS_Server;Req;;Type Commit request received
from tim at m3.lobos.nih.gov, sock=9
12/29/2006 11:20:22;0080;PBS_Server;Req;req_reject;Reject reply
code=15039(Job rejected by all possible destinations), aux=0,
type=Commit, from tim at m3.lobos.nih.gov
It looks like the job is never even assigned a number and rejected
before it even hits the routing queue.
I've scratched my head over this a little and just can't see what I'm
doing wrong. Any ideas?
Thanks,
Tim
--
Tim Miller
Contractor / System Administrator -- Laboratory of Computational Biology
National Institutes of Health -- Bldg. 50 Rm. 3310 -- 301-402-0618
More information about the torqueusers
mailing list