[torqueusers] qsub: Job rejected by all possible destinations

Sebastian Hübner seb.ffo at gmx.de
Mon Oct 25 14:52:43 MDT 2010


hi,

you probably get a lot of messages like this, but i could not find useable suggestions in mailinglist archive, so i decided to give it a try.
my problem is the following:
setup:
2 machines, one is running the server (torques-2.5.2)

cray
     state = free
     np = 1
     properties = medium
     ntype = cluster
     status = opsys=linux,uname=Linux cray 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=4676 4526 4671 4764 4847 5055 5252 5398 5064 5930 6657 11779 31507,nsessions=13,nusers=2,idletime=21916,totmem=60608512kb,availmem=56150208kb,physmem=8180396kb,ncpus=8,loadave=9.17,gres=,netload=334499160,state=free,jobs=,varattr=,rectime=1288040192

abacus
     state = free
     np = 1
     properties = huge
     ntype = cluster
     status = opsys=linux,uname=Linux abacus 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=26612 27674 30471 30476 30477 30503 30505 32761,nsessions=8,nusers=3,idletime=31766,totmem=99799032kb,availmem=98915332kb,physmem=49459364kb,ncpus=16,loadave=8.01,gres=,netload=6611239384,state=free,jobs=25.cray.chem.uni-potsdam.de,varattr=,rectime=1288040155

ok, so you probably will say np does not match mcpus, it does not work if i set it matching.
so heres my serverconfiguration:

#
# Create queues and set their attributes.
#
#
# Create and define queue MinMax
#
create queue MinMax
set queue MinMax queue_type = Execution
set queue MinMax resources_max.mem = 6gb
set queue MinMax resources_max.ncpus = 8
set queue MinMax resources_max.nodes = 1
set queue MinMax resources_default.mem = 100mb
set queue MinMax resources_default.ncpus = 1
set queue MinMax resources_default.nodes = 1
set queue MinMax enabled = True
set queue MinMax started = True
#
# Create and define queue anteroom
#
create queue anteroom
set queue anteroom queue_type = Route
set queue anteroom route_destinations = MinMax at cray.chem.uni-potsdam.de
set queue anteroom route_destinations += Abacus at cray.chem.uni-potsdam.de
set queue anteroom enabled = True
set queue anteroom started = True
#
# Create and define queue Abacus
#
create queue Abacus
set queue Abacus queue_type = Execution
set queue Abacus resources_max.mem = 16gb
set queue Abacus resources_max.ncpus = 16
set queue Abacus resources_max.nodes = 1
set queue Abacus resources_min.ncpus = 8
set queue Abacus resources_default.mem = 4gb
set queue Abacus resources_default.ncpus = 16
set queue Abacus resources_default.nodes = 1
set queue Abacus enabled = True
set queue Abacus started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = cray
set server default_queue = anteroom
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server allow_node_submit = True
set server next_job_number = 100

so far, so good. when i submit a job it will be mached against queue defaults like wanted, but the node in use will always be cray. but this ist due to the fact that the server recognizes itself als compute node.
when i do not use a nodes file ist will still be sent to cray.

the actual problem ist submitting a job to a singel node:

echo "sleep 30" | qsub -l nodes=cray[abacus] <return>
qsub: Job rejected by all possible destinations

this is my /etc/hosts:

#
# hosts         This file describes a number of hostname-to-address
#               mappings for the TCP/IP subsystem.  It is mostly
#               used at boot time, when no name servers are running.
#               On small systems, this file can be used instead of a
#               "named" name server.
# Syntax:
#    
# IP-Address  Full-Qualified-Hostname  Short-Hostname
#

127.0.0.1       localhost

# special IPv6 addresses
::1             localhost ipv6-localhost ipv6-loopback

fe00::0         ipv6-localnet

ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
#127.0.0.2       cray.chem.uni-potsdam.de cray
141.89.198.27   cray2.chem.uni-potsdam.de cray2
141.89.198.25   cray.chem.uni-potsdam.de cray
141.89.198.123  abacus.chem.uni-potsdam.de abacus

the host abacus is set up with public rsa-keys to the server(cray) and vice versa.
anny suggestions?

best regards seb!


More information about the torqueusers mailing list