[torqueusers] Help Queue Priority

Juno Kim | AGM redes03 at agm.com.br
Mon Nov 18 06:21:57 MST 2013


Hello
people

How do I set the priorities of my queue in torque?
I have the queues:
batch
sapda
user1
user2
user3

Queues user1, user2 and user3 want them to have the same priorities and 
queues:
sapda batch and have a higher priority than queues of users.

All of which may compete with each other as the priority, or
users of queues user1, user2 and user3 when they send jobs they also 
compete with each other ... that runs a job each user.


Likewise in batch queues and sapda.
*====================*
Atenciosamente,

*Juno Costa Kim*
*Departamento de Redes*
*AGM Telecom*
*====================*
IP Phone: +55 (48) 3221-0100
Fax : +55 (48) 3222-7747
Email : redes03 at agm.com.br
Website: www.agm.com.br
Rua Joe Collaço, 163
88037-010 - Santa Mônica - Florianópolis - SC

Em 15-11-2013 18:11, Jagga Soorma escreveu:
> So, this is a brand new install of torque without anything running on 
> the server/client except the torque processes.  I checked and I don't 
> think the server is running into any process limits.
>
> I setup the server & sched processes on the client itself and now am 
> running everything on the client host to rule out external components. 
>  I see the same problem with the connection to 15002 being a problem. 
>  I had a 1Gig copper connection on this server as well and migrated my 
> network to  a completely different nic and that did not help either.
>
> This is really a bizarre one that I can't seem to find the cause for. 
>  Any other things you guys think might help me troubleshoot this problem?
>
> Thanks,
> -J
>
>
> On Fri, Nov 15, 2013 at 4:05 AM, Jonathan Barber 
> <jonathan.barber at gmail.com <mailto:jonathan.barber at gmail.com>> wrote:
>
>     On 15 November 2013 03:18, Jagga Soorma <jagga13 at gmail.com
>     <mailto:jagga13 at gmail.com>> wrote:
>
>         I changed the log level and here is what I see on the server:
>
>         Looks like it is intermittently having issues connecting to
>         port 15002 on the client.  This client was just fine under the
>         2.5.9 torque production environment that we have but seems to
>         be intermittently having issues in the 2.5.13 test environment
>         that is setup with gpu support.
>
>     [snip]
>
>
>         11/14/2013
>         19:15:20;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
>         setting job 7352.server1.xxx.com <http://7352.server1.xxx.com>
>         state from QUEUED-QUEUED to RUNNING-PRERUN (4-40)
>         11/14/2013 19:15:20;0008;PBS_Server;Job;7352.server1.xxx.com
>         <http://7352.server1.xxx.com>;forking in send_job
>         *11/14/2013
>         19:15:20;0004;PBS_Server;Svr;svr_connect;attempting connect to
>         host 72.34.135.64 port 15002
>         11/14/2013 19:15:20;0004;PBS_Server;Svr;svr_connect;cannot
>         connect to host port 15002 - cannot establish connection () -
>         time=0 seconds*
>         *11/14/2013
>         19:15:22;0004;PBS_Server;Svr;svr_connect;attempting connect to
>         host 72.34.135.64 port 15002
>         11/14/2013 19:15:22;0004;PBS_Server;Svr;svr_connect;cannot
>         connect to host port 15002 - cannot establish connection () -
>         time=0 seconds*
>         11/14/2013 19:15:22;0008;PBS_Server;Job;7352.server1.xxx.com
>         <http://7352.server1.xxx.com>;entering post_sendmom
>
>
>     You might be running up against limits on the number of file
>     descriptors the pbs_server process or the OS is allowed to have
>     open. You can use tools such as lsof to see how many files the
>     pbs_server has open:
>     $ sudo lsof -c pbs_server
>
>     It's also possible that you're running out of ports to bind to.
>     Running lsof/netstat and looking to see if there are massive
>     numbers of connections/files open will reveal this.
>
>     Although you say there is no firewall configured on the servers,
>     do you know if there a firewall between the pbs_server and the nodes?
>
>     You can do a simple TCP connect to the mom to see if it's listening:
>     $ nmap -p 15002 ava01.grid.fe.up.pt <http://ava01.grid.fe.up.pt>
>     -oG -
>     # Nmap 6.40 scan initiated Fri Nov 15 11:52:17 2013 as: nmap -p
>     15002 -oG - ava01.grid.fe.up.pt <http://ava01.grid.fe.up.pt>
>     Host: 192.168.147.1 (ava01.grid.fe.up.pt
>     <http://ava01.grid.fe.up.pt>)Status: Up
>     Host: 192.168.147.1 (ava01.grid.fe.up.pt
>     <http://ava01.grid.fe.up.pt>)Ports: 15002/open/tcp//unknown///
>     # Nmap done at Fri Nov 15 11:52:17 2013 -- 1 IP address (1 host
>     up) scanned in 0.04 seconds
>     $
>
>     Or continuously with hping3 (I'm sure there are other tools that
>     will do this as well):
>     $ sudo hping3 -S -p 15002 ava01.grid.fe.up.pt
>     <http://ava01.grid.fe.up.pt>
>     HPING ava01.grid.fe.up.pt <http://ava01.grid.fe.up.pt> (em1
>     192.168.147.1): S set, 40 headers + 0 data bytes
>     len=46 ip=192.168.147.1 ttl=61 DF id=0 sport=15002 flags=SA seq=0
>     win=14600 rtt=1.5 ms
>     len=46 ip=192.168.147.1 ttl=61 DF id=0 sport=15002 flags=SA seq=1
>     win=14600 rtt=0.8 ms
>     len=46 ip=192.168.147.1 ttl=61 DF id=0 sport=15002 flags=SA seq=2
>     win=14600 rtt=0.6 ms
>     len=46 ip=192.168.147.1 ttl=61 DF id=0 sport=15002 flags=SA seq=3
>     win=14600 rtt=1.0 ms
>     len=46 ip=192.168.147.1 ttl=61 DF id=0 sport=15002 flags=SA seq=4
>     win=14600 rtt=1.2 ms
>
>     (SA means it's open)
>
>     HTH
>     -- 
>     Jonathan Barber <jonathan.barber at gmail.com
>     <mailto:jonathan.barber at gmail.com>>
>
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131118/8c4a5749/attachment-0001.html 


More information about the torqueusers mailing list