[torqueusers] qsub: Job rejected by all possible destinations

Justin Finnerty justin.finnerty at uni-oldenburg.de
Tue Oct 26 07:20:43 MDT 2010


On Mon, 2010-10-25 at 22:52 +0200, Sebastian Hübner wrote:
> hi,
> 
> you probably get a lot of messages like this, but i could not find useable suggestions in mailinglist archive, so i decided to give it a try.
> my problem is the following:
> setup:
> 2 machines, one is running the server (torques-2.5.2)
> 
> cray
>      state = free
>      np = 1
>      properties = medium
>      ntype = cluster
>      status = opsys=linux,uname=Linux cray 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=4676 4526 4671 4764 4847 5055 5252 5398 5064 5930 6657 11779 31507,nsessions=13,nusers=2,idletime=21916,totmem=60608512kb,availmem=56150208kb,physmem=8180396kb,ncpus=8,loadave=9.17,gres=,netload=334499160,state=free,jobs=,varattr=,rectime=1288040192
> 
> abacus
>      state = free
>      np = 1
>      properties = huge
>      ntype = cluster
>      status = opsys=linux,uname=Linux abacus 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=26612 27674 30471 30476 30477 30503 30505 32761,nsessions=8,nusers=3,idletime=31766,totmem=99799032kb,availmem=98915332kb,physmem=49459364kb,ncpus=16,loadave=8.01,gres=,netload=6611239384,state=free,jobs=25.cray.chem.uni-potsdam.de,varattr=,rectime=1288040155

(1) Something is wrong here!  np should be number of cores, only one is
reported!

your (...)/server_priv/nodes file should have something like

cray np=8 medium
abacus np=16 huge

Torque can be very particular about the matching of hosts to hostnames.
If the above does not work then you may need to play around with
hostnames in /etc/hosts and the nodes file until the np is correct.  You
probably have also seen that nodes with two network adapters must have
the name of the compute-node IP listed before the external IP
in /etc/hosts.  You might also need to use the full hostname everywhere
in torque and not the just first part too.

(2) Selecting queues by resources_max doesn't seem to work, try
inverting the logic.  Ie say all jobs needing more than 8 cpus go 
onto abacus, otherwise put on cray.

You also did not show how you mapped the queues onto the hosts.  As
there are only two hosts I would use the following if you are not doing
the mapping elsewhere.

set queue Abacus resources_min.ncpus = 9
set queue Abacus from_route_only = True
set queue Abacus acl_host_enable = False
set queue Abacus acl_hosts = abacus

set queue MinMax resources_min.ncpus = 1
set queue MinMax from_route_only = True
set queue MinMax acl_host_enable = False
set queue MinMax acl_hosts = cray

# If you want to allow <8 cpu jobs to use abacus then:
# set queue MinMax acl_hosts = cray+abacus

set queue anteroom route_destinations = Abacus
set queue anteroom route_destinations += MinMax

Cheers
Justin Finnerty

-- 
Justin Finnerty <justin.finnerty at uni-oldenburg.de>
Carl von Ossietzky Universitat, Oldenburg 



More information about the torqueusers mailing list