[torqueusers] torqueusers Digest, Vol 75, Issue 35

Sebastian Hübner seb.ffo at gmx.de
Thu Oct 28 13:00:35 MDT 2010


hallo,

thank you very much for your help, i finaly got all working, we startet testing.

best regards, seb!

On Tue, 26 Oct 2010 12:00:01 -0600
torqueusers-request at supercluster.org wrote:

> Send torqueusers mailing list submissions to
> 	torqueusers at supercluster.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.supercluster.org/mailman/listinfo/torqueusers
> or, via email, send a message with subject or body 'help' to
> 	torqueusers-request at supercluster.org
> 
> You can reach the person managing the list at
> 	torqueusers-owner at supercluster.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of torqueusers digest..."
> 
> 
> Today's Topics:
> 
>    1. qsub: Job rejected by all possible destinations (Sebastian H?bner)
>    2. Re: qsub: Job rejected by all possible destinations
>       (Justin Finnerty)
>    3. New 2.5.3 release candidate available (Ken Nielson)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 25 Oct 2010 22:52:43 +0200
> From: Sebastian H?bner <seb.ffo at gmx.de>
> Subject: [torqueusers] qsub: Job rejected by all possible destinations
> To: torqueusers at supercluster.org
> Message-ID: <20101025225243.cc8f2403.seb.ffo at gmx.de>
> Content-Type: text/plain; charset=US-ASCII
> 
> hi,
> 
> you probably get a lot of messages like this, but i could not find useable suggestions in mailinglist archive, so i decided to give it a try.
> my problem is the following:
> setup:
> 2 machines, one is running the server (torques-2.5.2)
> 
> cray
>      state = free
>      np = 1
>      properties = medium
>      ntype = cluster
>      status = opsys=linux,uname=Linux cray 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=4676 4526 4671 4764 4847 5055 5252 5398 5064 5930 6657 11779 31507,nsessions=13,nusers=2,idletime=21916,totmem=60608512kb,availmem=56150208kb,physmem=8180396kb,ncpus=8,loadave=9.17,gres=,netload=334499160,state=free,jobs=,varattr=,rectime=1288040192
> 
> abacus
>      state = free
>      np = 1
>      properties = huge
>      ntype = cluster
>      status = opsys=linux,uname=Linux abacus 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=26612 27674 30471 30476 30477 30503 30505 32761,nsessions=8,nusers=3,idletime=31766,totmem=99799032kb,availmem=98915332kb,physmem=49459364kb,ncpus=16,loadave=8.01,gres=,netload=6611239384,state=free,jobs=25.cray.chem.uni-potsdam.de,varattr=,rectime=1288040155
> 
> ok, so you probably will say np does not match mcpus, it does not work if i set it matching.
> so heres my serverconfiguration:
> 
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue MinMax
> #
> create queue MinMax
> set queue MinMax queue_type = Execution
> set queue MinMax resources_max.mem = 6gb
> set queue MinMax resources_max.ncpus = 8
> set queue MinMax resources_max.nodes = 1
> set queue MinMax resources_default.mem = 100mb
> set queue MinMax resources_default.ncpus = 1
> set queue MinMax resources_default.nodes = 1
> set queue MinMax enabled = True
> set queue MinMax started = True
> #
> # Create and define queue anteroom
> #
> create queue anteroom
> set queue anteroom queue_type = Route
> set queue anteroom route_destinations = MinMax at cray.chem.uni-potsdam.de
> set queue anteroom route_destinations += Abacus at cray.chem.uni-potsdam.de
> set queue anteroom enabled = True
> set queue anteroom started = True
> #
> # Create and define queue Abacus
> #
> create queue Abacus
> set queue Abacus queue_type = Execution
> set queue Abacus resources_max.mem = 16gb
> set queue Abacus resources_max.ncpus = 16
> set queue Abacus resources_max.nodes = 1
> set queue Abacus resources_min.ncpus = 8
> set queue Abacus resources_default.mem = 4gb
> set queue Abacus resources_default.ncpus = 16
> set queue Abacus resources_default.nodes = 1
> set queue Abacus enabled = True
> set queue Abacus started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server acl_hosts = cray
> set server default_queue = anteroom
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server allow_node_submit = True
> set server next_job_number = 100
> 
> so far, so good. when i submit a job it will be mached against queue defaults like wanted, but the node in use will always be cray. but this ist due to the fact that the server recognizes itself als compute node.
> when i do not use a nodes file ist will still be sent to cray.
> 
> the actual problem ist submitting a job to a singel node:
> 
> echo "sleep 30" | qsub -l nodes=cray[abacus] <return>
> qsub: Job rejected by all possible destinations
> 
> this is my /etc/hosts:
> 
> #
> # hosts         This file describes a number of hostname-to-address
> #               mappings for the TCP/IP subsystem.  It is mostly
> #               used at boot time, when no name servers are running.
> #               On small systems, this file can be used instead of a
> #               "named" name server.
> # Syntax:
> #    
> # IP-Address  Full-Qualified-Hostname  Short-Hostname
> #
> 
> 127.0.0.1       localhost
> 
> # special IPv6 addresses
> ::1             localhost ipv6-localhost ipv6-loopback
> 
> fe00::0         ipv6-localnet
> 
> ff00::0         ipv6-mcastprefix
> ff02::1         ipv6-allnodes
> ff02::2         ipv6-allrouters
> ff02::3         ipv6-allhosts
> #127.0.0.2       cray.chem.uni-potsdam.de cray
> 141.89.198.27   cray2.chem.uni-potsdam.de cray2
> 141.89.198.25   cray.chem.uni-potsdam.de cray
> 141.89.198.123  abacus.chem.uni-potsdam.de abacus
> 
> the host abacus is set up with public rsa-keys to the server(cray) and vice versa.
> anny suggestions?
> 
> best regards seb!
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 26 Oct 2010 15:20:43 +0200
> From: Justin Finnerty <justin.finnerty at uni-oldenburg.de>
> Subject: Re: [torqueusers] qsub: Job rejected by all possible
> 	destinations
> To: Torque Users Mailing List <torqueusers at supercluster.org>
> Message-ID: <1288099243.4882.27.camel at carbon>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Mon, 2010-10-25 at 22:52 +0200, Sebastian H?bner wrote:
> > hi,
> > 
> > you probably get a lot of messages like this, but i could not find useable suggestions in mailinglist archive, so i decided to give it a try.
> > my problem is the following:
> > setup:
> > 2 machines, one is running the server (torques-2.5.2)
> > 
> > cray
> >      state = free
> >      np = 1
> >      properties = medium
> >      ntype = cluster
> >      status = opsys=linux,uname=Linux cray 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=4676 4526 4671 4764 4847 5055 5252 5398 5064 5930 6657 11779 31507,nsessions=13,nusers=2,idletime=21916,totmem=60608512kb,availmem=56150208kb,physmem=8180396kb,ncpus=8,loadave=9.17,gres=,netload=334499160,state=free,jobs=,varattr=,rectime=1288040192
> > 
> > abacus
> >      state = free
> >      np = 1
> >      properties = huge
> >      ntype = cluster
> >      status = opsys=linux,uname=Linux abacus 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64,sessions=26612 27674 30471 30476 30477 30503 30505 32761,nsessions=8,nusers=3,idletime=31766,totmem=99799032kb,availmem=98915332kb,physmem=49459364kb,ncpus=16,loadave=8.01,gres=,netload=6611239384,state=free,jobs=25.cray.chem.uni-potsdam.de,varattr=,rectime=1288040155
> 
> (1) Something is wrong here!  np should be number of cores, only one is
> reported!
> 
> your (...)/server_priv/nodes file should have something like
> 
> cray np=8 medium
> abacus np=16 huge
> 
> Torque can be very particular about the matching of hosts to hostnames.
> If the above does not work then you may need to play around with
> hostnames in /etc/hosts and the nodes file until the np is correct.  You
> probably have also seen that nodes with two network adapters must have
> the name of the compute-node IP listed before the external IP
> in /etc/hosts.  You might also need to use the full hostname everywhere
> in torque and not the just first part too.
> 
> (2) Selecting queues by resources_max doesn't seem to work, try
> inverting the logic.  Ie say all jobs needing more than 8 cpus go 
> onto abacus, otherwise put on cray.
> 
> You also did not show how you mapped the queues onto the hosts.  As
> there are only two hosts I would use the following if you are not doing
> the mapping elsewhere.
> 
> set queue Abacus resources_min.ncpus = 9
> set queue Abacus from_route_only = True
> set queue Abacus acl_host_enable = False
> set queue Abacus acl_hosts = abacus
> 
> set queue MinMax resources_min.ncpus = 1
> set queue MinMax from_route_only = True
> set queue MinMax acl_host_enable = False
> set queue MinMax acl_hosts = cray
> 
> # If you want to allow <8 cpu jobs to use abacus then:
> # set queue MinMax acl_hosts = cray+abacus
> 
> set queue anteroom route_destinations = Abacus
> set queue anteroom route_destinations += MinMax
> 
> Cheers
> Justin Finnerty
> 
> -- 
> Justin Finnerty <justin.finnerty at uni-oldenburg.de>
> Carl von Ossietzky Universitat, Oldenburg 
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 26 Oct 2010 09:51:18 -0600
> From: Ken Nielson <knielson at adaptivecomputing.com>
> Subject: [torqueusers] New 2.5.3 release candidate available
> To: torquedev <torquedev at supercluster.org>, 	torqueusers
> 	<torqueusers at supercluster.org>
> Message-ID: <4CC6F8F6.1030903 at adaptivecomputing.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> There is a new 2.5.3 release candidate snapshot. This build only has 
> some bug fixes and one other minor change.
> 
> We changed the server startup to ignore hosts in the nodes file that 
> cannot be resolved. Previously, pbs_server would terminate if it could 
> not resolve a host name.
> 
> We fixed a bug where a new job log was not getting created at the start 
> of a new day.
> 
> If there are no more issues this will become the official 2.5.3 release.
> 
> Thanks for downloading and giving it a try. The tar ball can be found at 
> http://www.clusterresources.com/downloads/torque/snapshots/torque-2.5.3-snap.201010251614.tar.gz 
> <http://www.clusterresources.com/downloads/torque/snapshots/torque-2.5.3-snap.201010251614.tar.gz>
> 
> 
> Ken Nielson
> Adaptive Computing
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20101026/2e8eab0e/attachment-0001.html 
> 
> ------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> End of torqueusers Digest, Vol 75, Issue 35
> *******************************************


More information about the torqueusers mailing list