[torqueusers] qsub: Job rejected by all possible destinations
Justin Finnerty
justin.finnerty at uni-oldenburg.de
Tue Oct 26 07:20:43 MDT 2010
On Mon, 2010-10-25 at 22:52 +0200, Sebastian Hübner wrote:
> hi,
>
> you probably get a lot of messages like this, but i could not find useable suggestions in mailinglist archive, so i decided to give it a try.
> my problem is the following:
> setup:
> 2 machines, one is running the server (torques-2.5.2)
>
> cray
> state = free
> np = 1
> properties = medium
> ntype = cluster
> status = opsys=linux,uname=Linux cray 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=4676 4526 4671 4764 4847 5055 5252 5398 5064 5930 6657 11779 31507,nsessions=13,nusers=2,idletime=21916,totmem=60608512kb,availmem=56150208kb,physmem=8180396kb,ncpus=8,loadave=9.17,gres=,netload=334499160,state=free,jobs=,varattr=,rectime=1288040192
>
> abacus
> state = free
> np = 1
> properties = huge
> ntype = cluster
> status = opsys=linux,uname=Linux abacus 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=26612 27674 30471 30476 30477 30503 30505 32761,nsessions=8,nusers=3,idletime=31766,totmem=99799032kb,availmem=98915332kb,physmem=49459364kb,ncpus=16,loadave=8.01,gres=,netload=6611239384,state=free,jobs=25.cray.chem.uni-potsdam.de,varattr=,rectime=1288040155
(1) Something is wrong here! np should be number of cores, only one is
reported!
your (...)/server_priv/nodes file should have something like
cray np=8 medium
abacus np=16 huge
Torque can be very particular about the matching of hosts to hostnames.
If the above does not work then you may need to play around with
hostnames in /etc/hosts and the nodes file until the np is correct. You
probably have also seen that nodes with two network adapters must have
the name of the compute-node IP listed before the external IP
in /etc/hosts. You might also need to use the full hostname everywhere
in torque and not the just first part too.
(2) Selecting queues by resources_max doesn't seem to work, try
inverting the logic. Ie say all jobs needing more than 8 cpus go
onto abacus, otherwise put on cray.
You also did not show how you mapped the queues onto the hosts. As
there are only two hosts I would use the following if you are not doing
the mapping elsewhere.
set queue Abacus resources_min.ncpus = 9
set queue Abacus from_route_only = True
set queue Abacus acl_host_enable = False
set queue Abacus acl_hosts = abacus
set queue MinMax resources_min.ncpus = 1
set queue MinMax from_route_only = True
set queue MinMax acl_host_enable = False
set queue MinMax acl_hosts = cray
# If you want to allow <8 cpu jobs to use abacus then:
# set queue MinMax acl_hosts = cray+abacus
set queue anteroom route_destinations = Abacus
set queue anteroom route_destinations += MinMax
Cheers
Justin Finnerty
--
Justin Finnerty <justin.finnerty at uni-oldenburg.de>
Carl von Ossietzky Universitat, Oldenburg
More information about the torqueusers
mailing list