From siegert at sfu.ca Fri Jun 1 17:20:04 2012 From: siegert at sfu.ca (Martin Siegert) Date: Fri, 1 Jun 2012 16:20:04 -0700 Subject: [torquedev] torque server: setting the server name In-Reply-To: <20120529203615.GA31910@stikine.sfu.ca> References: <20120528192954.GC20824@stikine.sfu.ca> <7197651.kavNGebP1U@newton.cc.uit.no> <20120529175055.GB25034@stikine.sfu.ca> <20120529203615.GA31910@stikine.sfu.ca> Message-ID: <20120601232004.GA10915@stikine.sfu.ca> Hi, moving this to the dev list ... On Tue, May 29, 2012 at 01:36:15PM -0700, Martin Siegert wrote: > Hi David, > > I will definitely add --with-tcp-retry-limit=5 to my configure options, > since we did run into exactly that situation. However, the current > situation is due to an ip mismatch between private and public ip address > of the torque server: svr_connect.c, line 172 > > if ((hostaddr == pbs_server_addr) && (port == pbs_server_port_dis)) > { > return(PBS_LOCAL_CONNECTION); /* special value for local */ > } > > In our case: hostaddr = 172.18.1.0 and pbs_server_addr = 206.12.24.2. > The former ip address is the (correct) ip address on the internal > cluster network, the latter ip address is the public ip address and > should not be used by torque anywhere. > > We have in /etc/hosts > > 172.18.1.0 b0 > > and then set the server name in 4 (!!) different places: > 1) in qmgr we have > set server server_name = b0 > 2) /var/spool/torque/server_name contains b0 > 3) /var/spool/torque/torque.cfg contains > SERVERHOST b0 > 4) we configure with > --with-default-server=b0 > > I always thought that it should be sufficient to set this once. > Obviously I am wrong ... I am missing at least a fifth spot where > I need to set this: how do I get torque server to set pbs_server_addr > in svr_connect to 172.18.1.0? > > For now we used the following workaround: > 1) in /etc/hosts set > > 172.18.1.0 hostname.domain.ca hostname b0 > > 2) restart torque server and wait a few seconds until qstat, etc. > responds. > > 3) change /etc/hosts back to > 172.18.1.0 b0 > > This does "solve" the problem for now. > I am still looking for a more permanent solution. I did miss a fifth (and actually 6th) way of setting the server name: 5) start the server with the -H b0 commandline option. As it turns out this is the only way. Methods 1-4 have no effect. At this point I am wondering why we need 5 ways of setting the server name. As a first step can somebody tell me what each of the 5 settings accomplish? This is my take: 1) in qmgr: set server server_name = b0 As far as I can tell this has no effect. Can this be eliminated? 2) /var/spool/torque/server_name This is essential: used by the clients (qsub, qstat, etc.) and also by the mom (if no $pbsserver is specified in mom_priv/config). Not used by the torque server. 3) torque.cfg SERVERHOST b0 Read by qsub only. The man page says: SERVERHOST specifies the value for the PBS_SERVER environment variable I find this confusing: why would you want to set that environment variable to something different than what is read from the server_name file? In other words: what is the use case for having SERVERHOST set to something different than what is in the server_name file? Is it safe to say that this is not needed when the server_name file is in place? 4) configure option --with-default-server=b0 Does this have any effect? 5) pbs_server -H b0 commandline option essential. Determines the ip address to be used for the server. If not used, gethostname is used to determine the ipaddress. 6) $pbsserver setting in mom_priv/config Used by the mom for connecting to server; not needed when server_name file is in place. Is my assessment correct that only (2) and (5) are really needed? Furthermore, (1) and (4) and possibly (3) do not serve any purpose? Cheers, Martin From knielson at adaptivecomputing.com Mon Jun 4 09:09:37 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Mon, 4 Jun 2012 09:09:37 -0600 Subject: [torquedev] torque server: setting the server name In-Reply-To: <20120601232004.GA10915@stikine.sfu.ca> References: <20120528192954.GC20824@stikine.sfu.ca> <7197651.kavNGebP1U@newton.cc.uit.no> <20120529175055.GB25034@stikine.sfu.ca> <20120529203615.GA31910@stikine.sfu.ca> <20120601232004.GA10915@stikine.sfu.ca> Message-ID: Martin, I have added your problem description to our internal ticket system so we don't lose this information. We definitely need a better way to handle server names, especially on multi-homed systems. Thanks Ken On Fri, Jun 1, 2012 at 5:20 PM, Martin Siegert wrote: > Hi, > > moving this to the dev list ... > > On Tue, May 29, 2012 at 01:36:15PM -0700, Martin Siegert wrote: > > Hi David, > > > > I will definitely add --with-tcp-retry-limit=5 to my configure options, > > since we did run into exactly that situation. However, the current > > situation is due to an ip mismatch between private and public ip address > > of the torque server: svr_connect.c, line 172 > > > > if ((hostaddr == pbs_server_addr) && (port == pbs_server_port_dis)) > > { > > return(PBS_LOCAL_CONNECTION); /* special value for local */ > > } > > > > In our case: hostaddr = 172.18.1.0 and pbs_server_addr = 206.12.24.2. > > The former ip address is the (correct) ip address on the internal > > cluster network, the latter ip address is the public ip address and > > should not be used by torque anywhere. > > > > We have in /etc/hosts > > > > 172.18.1.0 b0 > > > > and then set the server name in 4 (!!) different places: > > 1) in qmgr we have > > set server server_name = b0 > > 2) /var/spool/torque/server_name contains b0 > > 3) /var/spool/torque/torque.cfg contains > > SERVERHOST b0 > > 4) we configure with > > --with-default-server=b0 > > > > I always thought that it should be sufficient to set this once. > > Obviously I am wrong ... I am missing at least a fifth spot where > > I need to set this: how do I get torque server to set pbs_server_addr > > in svr_connect to 172.18.1.0? > > > > For now we used the following workaround: > > 1) in /etc/hosts set > > > > 172.18.1.0 hostname.domain.ca hostname b0 > > > > 2) restart torque server and wait a few seconds until qstat, etc. > > responds. > > > > 3) change /etc/hosts back to > > 172.18.1.0 b0 > > > > This does "solve" the problem for now. > > I am still looking for a more permanent solution. > > I did miss a fifth (and actually 6th) way of setting the server name: > > 5) start the server with the -H b0 commandline option. > > As it turns out this is the only way. Methods 1-4 have no effect. > > At this point I am wondering why we need 5 ways of setting the server > name. As a first step can somebody tell me what each of the 5 settings > accomplish? > > This is my take: > > 1) in qmgr: > > set server server_name = b0 > > As far as I can tell this has no effect. Can this be eliminated? > > 2) /var/spool/torque/server_name > > This is essential: used by the clients (qsub, qstat, etc.) and also by > the mom (if no $pbsserver is specified in mom_priv/config). Not used > by the torque server. > > 3) torque.cfg > SERVERHOST b0 > > Read by qsub only. The man page says: > SERVERHOST specifies the value for the PBS_SERVER environment variable > > I find this confusing: why would you want to set that environment variable > to something different than what is read from the server_name file? > In other words: what is the use case for having SERVERHOST set to something > different than what is in the server_name file? > > Is it safe to say that this is not needed when the server_name file is in > place? > > 4) configure option --with-default-server=b0 > Does this have any effect? > > 5) pbs_server -H b0 commandline option > essential. Determines the ip address to be used for the server. > If not used, gethostname is used to determine the ipaddress. > > 6) $pbsserver setting in mom_priv/config > Used by the mom for connecting to server; not needed when > server_name file is in place. > > Is my assessment correct that only (2) and (5) are really needed? > Furthermore, (1) and (4) and possibly (3) do not serve any purpose? > > Cheers, > Martin > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120604/2e7405da/attachment.html From knielson at adaptivecomputing.com Wed Jun 6 13:55:47 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Wed, 6 Jun 2012 13:55:47 -0600 Subject: [torquedev] Subversion URL changing for TORQUE Message-ID: Hi everyone, We are changing the URL to check out code for TORQUE to svn:// opensvn.adaptivecomputing.com/torque. The current URL svn://clusterresources.com/torque will still be available but will be removed soon. We will let you know a date as soon as we can. In the mean time please download source from svn:// opensvn.adaptivecomputing.com/torque. Thanks Ken Nielson Adaptive Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120606/e3871df9/attachment-0001.html From knielson at adaptivecomputing.com Wed Jun 6 14:59:56 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Wed, 6 Jun 2012 14:59:56 -0600 Subject: [torquedev] Update on source URL for TORQUE Message-ID: Hi All, In a previous e-mail we announced that the new subversion URL to download TORQUE source will be svn://opensvn.adaptivecomputing.com/torque. The subversion URL svn://clusterresources.com/torque is still available but will be decommissioned on June 20, 2012. Please start using the svn://opensvn.adaptivecomputing.com/torque URL and update any documentation you may have. Regards Ken Nielson Adaptive Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120606/de898c2b/attachment.html From samuel at unimelb.edu.au Wed Jun 6 19:42:44 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Thu, 07 Jun 2012 11:42:44 +1000 Subject: [torquedev] Update on source URL for TORQUE In-Reply-To: References: Message-ID: <4FD00714.4090509@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/06/12 06:59, Ken Nielson wrote: > Hi All, Hi Ken, > Please start using the svn://opensvn.adaptivecomputing.com/torque > URL and update any documentation you may have. Is that available yet ? Connection timed out: Can't connect to host 'opensvn.adaptivecomputing.com': Connection timed out at /usr/lib/git-core/git-svn line 2139 Actually, just did a tcptraceroute and it looks like it's a networking issue: samuel at eris:/tmp$ tcptraceroute opensvn.adaptivecomputing.com svn Selected device eth0, address 128.250.103.222, port 54492 for outgoing packets Tracing the path to opensvn.adaptivecomputing.com (205.15.87.226) on TCP port 3690 (svn), 30 hops max 1 128.250.103.194 0.484 ms 0.469 ms 0.301 ms 2 172.18.65.109 0.710 ms 0.418 ms 0.445 ms 3 172.18.66.153 0.748 ms 1.173 ms 1.117 ms 4 172.18.66.154 1.015 ms 1.117 ms 0.892 ms 5 172.18.66.246 0.450 ms 0.419 ms 0.424 ms 6 tengigabitethernet2-3.er2.unimelb.cpe.aarnet.net.au (202.158.206.161) 0.480 ms 0.667 ms 0.501 ms 7 ge-4-0-0.bb1.b.mel.aarnet.net.au (202.158.200.97) 0.464 ms !N 0.441 ms !N 0.454 ms !N So that's a network unreachable coming back from AARNET, the Australian academic + research network. Tracing from my VM in the US also doesn't get very far, I suspect there might be something up with how that network is being advertised.. csamuel:~# tcptraceroute opensvn.adaptivecomputing.com svn Selected device eth0, address 74.50.50.137, port 55487 for outgoing packets Tracing the path to opensvn.adaptivecomputing.com (205.15.87.226) on TCP port 3690 (svn), 30 hops max 1 74.50.50.1 0.554 ms 0.378 ms 0.449 ms 2 ge-1-2.dal.rimuhosting.com (65.99.204.69) 0.427 ms 0.653 ms 0.423 ms 3 * * * 4 * * * 5 * * * 6 * * * - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/QBxQACgkQO2KABBYQAh/Z+ACfXqJXmo9U/ezNLzitNiStINCJ 1pkAniG6nAIJoqbnKFb23Qbpv2rJpxa5 =y4M1 -----END PGP SIGNATURE----- From samuel at unimelb.edu.au Wed Jun 6 19:49:09 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Thu, 07 Jun 2012 11:49:09 +1000 Subject: [torquedev] Update on source URL for TORQUE In-Reply-To: <4FD00714.4090509@unimelb.edu.au> References: <4FD00714.4090509@unimelb.edu.au> Message-ID: <4FD00895.4090509@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/06/12 11:42, Christopher Samuel wrote: > Actually, just did a tcptraceroute and it looks like it's a > networking issue: Hmm, I suspect finger trouble in DNS, the IP address for opensvn.adaptivecomputing.com is in a netblock owned by the US DoD, I suspect this is not the machine we are looking for... samuel at eris:/tmp$ host opensvn.adaptivecomputing.com opensvn.adaptivecomputing.com has address 205.15.87.226 samuel at eris:/tmp$ host 205.15.87.226 Host 226.87.15.205.in-addr.arpa. not found: 3(NXDOMAIN) samuel at eris:/tmp$ whois 205.15.87.226 # # The following results may also be obtained via: # http://whois.arin.net/rest/nets;q=205.15.87.226?showDetails=true&showARIN=false&ext=netref2 # NetRange: 205.0.0.0 - 205.55.255.255 CIDR: 205.0.0.0/11, 205.48.0.0/13, 205.32.0.0/12 OriginAS: NetName: DNIC-SNET-205-000 NetHandle: NET-205-0-0-0-1 Parent: NET-205-0-0-0-0 NetType: Direct Assignment RegDate: 1998-05-18 Updated: 2011-06-21 Ref: http://whois.arin.net/rest/net/NET-205-0-0-0-1 OrgName: DoD Network Information Center OrgId: DNIC Address: 3990 E. Broad Street City: Columbus StateProv: OH PostalCode: 43218 Country: US RegDate: Updated: 2011-08-17 Ref: http://whois.arin.net/rest/org/DNIC OrgTechHandle: REGIS10-ARIN OrgTechName: Registration OrgTechPhone: +1-800-365-3642 OrgTechEmail: registra at nic.mil OrgTechRef: http://whois.arin.net/rest/poc/REGIS10-ARIN OrgAbuseHandle: REGIS10-ARIN OrgAbuseName: Registration OrgAbusePhone: +1-800-365-3642 OrgAbuseEmail: registra at nic.mil OrgAbuseRef: http://whois.arin.net/rest/poc/REGIS10-ARIN OrgTechHandle: MIL-HSTMST-ARIN OrgTechName: Network DoD OrgTechPhone: +1-614-692-6337 OrgTechEmail: HOSTMASTER at nic.mil OrgTechRef: http://whois.arin.net/rest/poc/MIL-HSTMST-ARIN # # ARIN WHOIS data and services are subject to the Terms of Use # available at: https://www.arin.net/whois_tou.html # - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/QCJUACgkQO2KABBYQAh9gPQCeMPixnNnV4/pIxxHrIGzFSHXB +mEAn1nmD9/GPlyGRp2XBkR5gPgsMROp =6lAP -----END PGP SIGNATURE----- From knielson at adaptivecomputing.com Thu Jun 7 09:03:37 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 7 Jun 2012 09:03:37 -0600 Subject: [torquedev] Update on source URL for TORQUE In-Reply-To: <4FD00714.4090509@unimelb.edu.au> References: <4FD00714.4090509@unimelb.edu.au> Message-ID: On Wed, Jun 6, 2012 at 7:42 PM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 07/06/12 06:59, Ken Nielson wrote: > > > Hi All, > > Hi Ken, > > > Please start using the svn://opensvn.adaptivecomputing.com/torque > > URL and update any documentation you may have. > > Is that available yet ? > > Connection timed out: Can't connect to host > 'opensvn.adaptivecomputing.com': Connection timed out at > /usr/lib/git-core/git-svn line 2139 > > Actually, just did a tcptraceroute and it looks like it's a networking > issue: > > samuel at eris:/tmp$ tcptraceroute opensvn.adaptivecomputing.com svn > Selected device eth0, address 128.250.103.222, port 54492 for outgoing > packets > Tracing the path to opensvn.adaptivecomputing.com (205.15.87.226) on > TCP port 3690 (svn), 30 hops max > 1 128.250.103.194 0.484 ms 0.469 ms 0.301 ms > 2 172.18.65.109 0.710 ms 0.418 ms 0.445 ms > 3 172.18.66.153 0.748 ms 1.173 ms 1.117 ms > 4 172.18.66.154 1.015 ms 1.117 ms 0.892 ms > 5 172.18.66.246 0.450 ms 0.419 ms 0.424 ms > 6 tengigabitethernet2-3.er2.unimelb.cpe.aarnet.net.au > (202.158.206.161) 0.480 ms 0.667 ms 0.501 ms > 7 ge-4-0-0.bb1.b.mel.aarnet.net.au (202.158.200.97) 0.464 ms !N > 0.441 ms !N 0.454 ms !N > > So that's a network unreachable coming back from AARNET, the > Australian academic + research network. > > Tracing from my VM in the US also doesn't get very far, I suspect > there might be something up with how that network is being advertised.. > > csamuel:~# tcptraceroute opensvn.adaptivecomputing.com svn > Selected device eth0, address 74.50.50.137, port 55487 for outgoing > packets > Tracing the path to opensvn.adaptivecomputing.com (205.15.87.226) on > TCP port 3690 (svn), 30 hops max > 1 74.50.50.1 0.554 ms 0.378 ms 0.449 ms > 2 ge-1-2.dal.rimuhosting.com (65.99.204.69) 0.427 ms 0.653 ms > 0.423 ms > 3 * * * > 4 * * * > 5 * * * > 6 * * * > > > - -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > > Chirs, The full URL is svn://opensvn.adaptivecomputing.com/torque. I am inside the firewall right now and can't test it. But it should be there. Let me know if you are still having trouble. Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120607/d381797c/attachment.html From tbaer at utk.edu Thu Jun 7 09:09:50 2012 From: tbaer at utk.edu (Troy Baer) Date: Thu, 7 Jun 2012 11:09:50 -0400 Subject: [torquedev] Update on source URL for TORQUE In-Reply-To: References: <4FD00714.4090509@unimelb.edu.au> Message-ID: <1339081790.16489.450.camel@browncoat.jics.utk.edu> On Thu, 2012-06-07 at 09:03 -0600, Ken Nielson wrote: > > The full URL is svn://opensvn.adaptivecomputing.com/torque. > > I am inside the firewall right now and can't test it. But it should be > there. Let me know if you are still having trouble. I can't get to it either: $ svn co svn://opensvn.adaptivecomputing.com/torque svn: Can't connect to host 'opensvn.adaptivecomputing.com': Connection timed out $ host opensvn.adaptivecomputing.com opensvn.adaptivecomputing.com has address 205.15.87.226 $ traceroute opensvn.adaptivecomputing.com traceroute to opensvn.adaptivecomputing.com (205.15.87.226), 30 hops max, 60 byte packets 1 160.36.230.1 (160.36.230.1) 0.533 ms 0.467 ms 0.397 ms 2 bhm01v981.ns.utk.edu (160.36.2.77) 1.830 ms 1.770 ms 1.709 ms 3 gi1-8.ccr01.atl04.atlas.cogentco.com (38.104.182.37) 101.393 ms 101.357 ms * [no packets get further than this] --Troy -- Troy Baer, Senior HPC System Administrator National Institute for Computational Sciences, University of Tennessee http://www.nics.tennessee.edu/ Phone: 865-241-4233 From knielson at adaptivecomputing.com Thu Jun 7 10:04:06 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 7 Jun 2012 10:04:06 -0600 Subject: [torquedev] Update on source URL for TORQUE In-Reply-To: <1339081790.16489.450.camel@browncoat.jics.utk.edu> References: <4FD00714.4090509@unimelb.edu.au> <1339081790.16489.450.camel@browncoat.jics.utk.edu> Message-ID: On Thu, Jun 7, 2012 at 9:09 AM, Troy Baer wrote: > On Thu, 2012-06-07 at 09:03 -0600, Ken Nielson wrote: > > > > > The full URL is svn://opensvn.adaptivecomputing.com/torque. > > > > I am inside the firewall right now and can't test it. But it should be > > there. Let me know if you are still having trouble. > > I can't get to it either: > > $ svn co svn://opensvn.adaptivecomputing.com/torque > svn: Can't connect to host 'opensvn.adaptivecomputing.com': Connection > timed out > > $ host opensvn.adaptivecomputing.com > opensvn.adaptivecomputing.com has address 205.15.87.226 > > $ traceroute opensvn.adaptivecomputing.com > traceroute to opensvn.adaptivecomputing.com (205.15.87.226), 30 hops > max, 60 byte packets > 1 160.36.230.1 (160.36.230.1) 0.533 ms 0.467 ms 0.397 ms > 2 bhm01v981.ns.utk.edu (160.36.2.77) 1.830 ms 1.770 ms 1.709 ms > 3 gi1-8.ccr01.atl04.atlas.cogentco.com (38.104.182.37) 101.393 ms > 101.357 ms * > [no packets get further than this] > > --Troy > -- > Troy Baer, Senior HPC System Administrator > National Institute for Computational Sciences, University of Tennessee > http://www.nics.tennessee.edu/ > Phone: 865-241-4233 > > > Troy and Chris, Thanks for letting us know. I have forwarded your comments to our IT folks. I will post again when they think they have it fixed. Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120607/1577a47a/attachment.html From knielson at adaptivecomputing.com Thu Jun 7 14:01:28 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 7 Jun 2012 14:01:28 -0600 Subject: [torquedev] Update on source URL for TORQUE In-Reply-To: References: Message-ID: Hi all, The svn://opensvn.adaptivecomputing.com/torque URL should now be working. Please let us know if you are still having problems. Ken Nielson Adaptive Computing On Wed, Jun 6, 2012 at 2:59 PM, Ken Nielson wrote: > Hi All, > > In a previous e-mail we announced that the new subversion URL to download > TORQUE source will be svn://opensvn.adaptivecomputing.com/torque. > > The subversion URL svn://clusterresources.com/torque is still available > but will be decommissioned on June 20, 2012. > > Please start using the svn://opensvn.adaptivecomputing.com/torque URL and > update any documentation you may have. > > Regards > > Ken Nielson > Adaptive Computing > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120607/ae50771c/attachment.html From knielson at adaptivecomputing.com Thu Jun 7 18:54:35 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 7 Jun 2012 18:54:35 -0600 Subject: [torquedev] TORQUE 2.4.17 available for download Message-ID: Hi all, We are proud to announce the availability of TORQUE 2.4.17. 2.4.17 will be the last official release of the TORQUE 2.4-fixes branch. For Moab users support from Adaptive Computing for 2.4.x will end on August 31, 2012. Following are the release notes for 2.4.17. New in 2.4.17 ------------- 2.4.17 will be the last release of the 2.4-fixes branch of TORQUE. The configuration option to build high availability changed from --with-high-availability to --enable-high-availability. A buffer overflow problem was fixed in tcp_puts which at times made it so not enough memory would be allocated for outbound data. This may account for some segfaults and memory corruption problems. For a list of the remaining fixes please see CHANGELOG. Please if you need this version of TORQUE download and let us know if you have any problems. Regards Ken Nielson Adaptive Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120607/fba5a39a/attachment.html From knielson at adaptivecomputing.com Tue Jun 12 13:26:47 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Tue, 12 Jun 2012 13:26:47 -0600 Subject: [torquedev] TORQUE 2.5.12 available Message-ID: Hi all, TORQUE 2.5.12 is available for general use. The tar ball can be downloaded from http://www.adaptivecomputing.com/support/download-center/torque-download/ New in 2.5.12 Enabled MOMs to change a users group on stderr/stdout files. This fixes a problem where files owned by the user but not its group caused a job to fail. Fixed a large memory leak on the mom when configured with --enable-nvidia-gpus. A buffer overflow problem was fixed in tcp_puts which at times made it so not enough memory would be allocated for outbound data. This may account for some segfaults and memory corruption problems. add the mom config option exec_with_exec. When set to true, the pbs_mom will exec the job script instead of piping it to stdin. This makes signal trapping easier because the shell doesn't have to be configured to trap the signal as well. For all bugs fixed in 2.5.12 please see the CHANGELOG. Regards Ken Nielson Adaptive Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120612/0f8a863f/attachment.html From bugzilla-daemon at supercluster.org Sun Jun 24 22:35:49 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Sun, 24 Jun 2012 22:35:49 -0600 (MDT) Subject: [torquedev] [Bug 197] New: qsub command line arguments overridden by script Message-ID: http://www.clusterresources.com/bugzilla/show_bug.cgi?id=197 Summary: qsub command line arguments overridden by script Product: TORQUE Version: 4.0.* Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P5 Component: clients AssignedTo: knielson at adaptivecomputing.com ReportedBy: cwest at vpac.org CC: torquedev at supercluster.org Estimated Hours: 0.0 Hi, When I submit a job with Torque 4.0.2 that has the number of nodes, and/or the walltime specified in the script, these values are over ridding the values I am using on the command line. I believe the command line has always overridden the script values, not the other way around. E.g. using the script below (1 node for 1 minute): #!/bin/bash #PBS -l nodes=1 #PBS -l walltime=00:01:00 pbsdsh -u ./ping.sh but submitting it with: qsub -l nodes=2,walltime=00:05:00 ping.batch results in the following in the qstat -f Resource_List.nodect = 1 Resource_List.nodes = 1 Resource_List.walltime = 00:01:00 If I comment the lines out, the command line values work correctly. Cheers, Craig. -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Wed Jun 27 04:02:56 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Wed, 27 Jun 2012 04:02:56 -0600 (MDT) Subject: [torquedev] [Bug 198] New: Accounting logs not showing usage information Message-ID: http://www.clusterresources.com/bugzilla/show_bug.cgi?id=198 Summary: Accounting logs not showing usage information Product: TORQUE Version: 4.0.* Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P5 Component: pbs_server AssignedTo: dbeer at adaptivecomputing.com ReportedBy: cwest at vpac.org CC: torquedev at supercluster.org Estimated Hours: 0.0 The Accounting logs are printing the submitted values of the job when the job completes rather than the information about the real values that were used during the job. That is, when a job starts it prints variables like the following in the accounting logs: Resource_List.walltime=01:00:00 And when the job completes I get the same variables e.g. Resource_List.walltime=01:00:00 Even when the job didn't run for an hour. I should be seeing things like: resources_used.walltime=00:29:02 Craig. -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.