From samuel at unimelb.edu.au Mon Jan 2 22:15:29 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 03 Jan 2012 16:15:29 +1100 Subject: [torquedev] Bad uid for job: user does not exist in password file In-Reply-To: References: Message-ID: <4F028EF1.2020005@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 20/12/11 23:27, hemant hariyale wrote: > i m able to submit job from server but not other hosts You need to set up your authenication so Torque is happy. See section "1.3.2.3 Configuring Job Submission Hosts" here: http://www.clusterresources.com/torquedocs/1.3advconfig.shtml > and other problem is that > is i submit a job from a host and if this job gets schedulled on some > other hosts than that hosts mom gets crashed and the when i request for > it del th jab is says > resource not available firs seconed time says not able to contact MOM Which version of Torque are you using ? cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8CjvEACgkQO2KABBYQAh8/jwCfZx5i8bqUgt76r7aQcnk74ogD P60AoJVmlM3jLS7h9Honsf/SqQStj8lL =xJNf -----END PGP SIGNATURE----- From dbeer at adaptivecomputing.com Wed Jan 4 09:11:57 2012 From: dbeer at adaptivecomputing.com (David Beer) Date: Wed, 04 Jan 2012 09:11:57 -0700 (MST) Subject: [torquedev] TORQUE 4.0 Is Officially Beta-Testing In-Reply-To: <201112231439.26327.samuel@unimelb.edu.au> Message-ID: ----- Original Message ----- > On Fri, 23 Dec 2011 02:27:52 PM Chris Samuel wrote: > > > In trunk there's a file README.new_in_4.0 which doesn't appear to > > be in that tarball for some reason > > Also missing are: > > README.trqauthd > README.NUMA > README.coding_notes > PBS_License.txt > > and quite a few others, I've attached a diff for the files that are > only in the trunk or only in the tarball (excluding the obvious > autoconf/svn ones of course). > > cheers! > Chris > -- > Christopher Samuel - Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.unimelb.edu.au/ > Sorry for the lateness of this reply. Those README files and the PBS_License.txt file need to be in the distribution. We will get this corrected. Looking at the diff: autogen.sh - this is only necessary for building from the source in svn, so we didn't include it. All those .h files that aren't included are mostly related to performing unit tests, which we didn't think people would be doing in production so they aren't included. Along the same line of thinking, I should have mentioned in my original email that if you configure TORQUE with --with-check it won't work unless you checked things out from svn (because these are other files are missing). We're not opposed to people having these files, we just thought that nobody would want them who is downloading TORQUE, only those who were checking out the code from svn. The files that are in the distribution but not in trunk, such as torque.spec and pbs_config.h.in, are files produced during the build. Sometime before the 3.0 release we had decided as a group that these files didn't need to be stored in svn, so that's why you don't see them if you check out. Thanks for looking at this and checking our work. It is very appreciated. -- David Beer Direct Line: 801-717-3386 | Fax: 801-717-3738 Adaptive Computing 1712 S East Bay Blvd, Suite 300 Provo, UT 84606 From domibel at cs.tu-berlin.de Thu Jan 5 14:35:45 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Thu, 5 Jan 2012 22:35:45 +0100 (CET) Subject: [torquedev] TORQUE 4.0 Is Officially Beta-Testing In-Reply-To: References: Message-ID: Could you also please clarify what part of the code is covered by a) the original PBS license (PBS_License.txt) and b) the new introduced license (PBS_License_2.5.txt) Thanks -Dominique On Wed, 4 Jan 2012, David Beer wrote: > > > ----- Original Message ----- >> On Fri, 23 Dec 2011 02:27:52 PM Chris Samuel wrote: >> >>> In trunk there's a file README.new_in_4.0 which doesn't appear to >>> be in that tarball for some reason >> >> Also missing are: >> >> README.trqauthd >> README.NUMA >> README.coding_notes >> PBS_License.txt >> >> and quite a few others, I've attached a diff for the files that are >> only in the trunk or only in the tarball (excluding the obvious >> autoconf/svn ones of course). >> >> cheers! >> Chris >> -- >> Christopher Samuel - Senior Systems Administrator >> VLSCI - Victorian Life Sciences Computation Initiative >> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 >> http://www.vlsci.unimelb.edu.au/ >> > > Sorry for the lateness of this reply. Those README files and the PBS_License.txt file need to be in the distribution. We will get this corrected. Looking at the diff: > > autogen.sh - this is only necessary for building from the source in svn, so we didn't include it. > All those .h files that aren't included are mostly related to performing unit tests, which we didn't think people would be doing in production so they aren't included. Along the same line of thinking, I should have mentioned in my original email that if you configure TORQUE with --with-check it won't work unless you checked things out from svn (because these are other files are missing). We're not opposed to people having these files, we just thought that nobody would want them who is downloading TORQUE, only those who were checking out the code from svn. > The files that are in the distribution but not in trunk, such as torque.spec and pbs_config.h.in, are files produced during the build. Sometime before the 3.0 release we had decided as a group that these files didn't need to be stored in svn, so that's why you don't see them if you check out. > > Thanks for looking at this and checking our work. It is very appreciated. > > -- > David Beer > Direct Line: 801-717-3386 | Fax: 801-717-3738 > Adaptive Computing > 1712 S East Bay Blvd, Suite 300 > Provo, UT 84606 > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From knielson at adaptivecomputing.com Thu Jan 5 16:20:34 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 05 Jan 2012 16:20:34 -0700 (MST) Subject: [torquedev] TORQUE 4.0 Is Officially Beta-Testing In-Reply-To: Message-ID: <075ece1a-eb48-4a5a-91b7-5266be6ce987@mail> Dominique, We have a bug in our distribution. We need to eliminate one of the PBS_License files since they are the same. With the release of TORQUE 2.5.0 we updated the license file to reflect that Adaptive Computing Enterprises, Inc. was the care taker of TORQUE and updated the contact information. The contact information in the previous license was out of date. We also removed provisions one and two from the original license since they expired on December 31, 2001. When these two provisions expired TORQUE became completely free for use and redistribution to commercial and non-commercial interests as long as the proper credits and copyrights were noted as explained in the remaining provisions. Sorry for the confusion. We will address this for the next snapshot. Ken Nielson Adaptive Computing ----- Original Message ----- > From: "Dominique Belhachemi" > To: "David Beer" , "Torque Developers mailing list" > Sent: Thursday, January 5, 2012 2:35:45 PM > Subject: Re: [torquedev] TORQUE 4.0 Is Officially Beta-Testing > > > Could you also please clarify what part of the code is covered by > a) the original PBS license (PBS_License.txt) and > b) the new introduced license (PBS_License_2.5.txt) > > Thanks > -Dominique > > > On Wed, 4 Jan 2012, David Beer wrote: > > > > > > > ----- Original Message ----- > >> On Fri, 23 Dec 2011 02:27:52 PM Chris Samuel wrote: > >> > >>> In trunk there's a file README.new_in_4.0 which doesn't appear to > >>> be in that tarball for some reason > >> > >> Also missing are: > >> > >> README.trqauthd > >> README.NUMA > >> README.coding_notes > >> PBS_License.txt > >> > >> and quite a few others, I've attached a diff for the files that > >> are > >> only in the trunk or only in the tarball (excluding the obvious > >> autoconf/svn ones of course). > >> > >> cheers! > >> Chris > >> -- > >> Christopher Samuel - Senior Systems Administrator > >> VLSCI - Victorian Life Sciences Computation Initiative > >> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 > >> http://www.vlsci.unimelb.edu.au/ > >> > > > > Sorry for the lateness of this reply. Those README files and the > > PBS_License.txt file need to be in the distribution. We will get > > this corrected. Looking at the diff: > > > > autogen.sh - this is only necessary for building from the source in > > svn, so we didn't include it. > > All those .h files that aren't included are mostly related to > > performing unit tests, which we didn't think people would be doing > > in production so they aren't included. Along the same line of > > thinking, I should have mentioned in my original email that if you > > configure TORQUE with --with-check it won't work unless you > > checked things out from svn (because these are other files are > > missing). We're not opposed to people having these files, we just > > thought that nobody would want them who is downloading TORQUE, > > only those who were checking out the code from svn. > > The files that are in the distribution but not in trunk, such as > > torque.spec and pbs_config.h.in, are files produced during the > > build. Sometime before the 3.0 release we had decided as a group > > that these files didn't need to be stored in svn, so that's why > > you don't see them if you check out. > > > > Thanks for looking at this and checking our work. It is very > > appreciated. > > > > -- > > David Beer > > Direct Line: 801-717-3386 | Fax: 801-717-3738 > > Adaptive Computing > > 1712 S East Bay Blvd, Suite 300 > > Provo, UT 84606 > > > > _______________________________________________ > > torquedev mailing list > > torquedev at supercluster.org > > http://www.supercluster.org/mailman/listinfo/torquedev > > > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From domibel at cs.tu-berlin.de Thu Jan 5 18:18:50 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Fri, 6 Jan 2012 02:18:50 +0100 (CET) Subject: [torquedev] TORQUE 4.0 Is Officially Beta-Testing In-Reply-To: <075ece1a-eb48-4a5a-91b7-5266be6ce987@mail> References: <075ece1a-eb48-4a5a-91b7-5266be6ce987@mail> Message-ID: Hi Ken, Yes, please remove PBS_License_2.5.txt and keep the original license file. Adaptive's contact information are well known and could easily be moved to the README file. Thanks -Dominique On Thu, 5 Jan 2012, Ken Nielson wrote: > Dominique, > > We have a bug in our distribution. We need to eliminate one of the PBS_License files since they are the same. > > With the release of TORQUE 2.5.0 we updated the license file to reflect that Adaptive Computing Enterprises, Inc. was the care taker of TORQUE and updated the contact information. The contact information in the previous license was out of date. We also removed provisions one and two from the original license since they expired on December 31, 2001. When these two provisions expired TORQUE became completely free for use and redistribution to commercial and non-commercial interests as long as the proper credits and copyrights were noted as explained in the remaining provisions. > > Sorry for the confusion. We will address this for the next snapshot. > > Ken Nielson > Adaptive Computing > > > ----- Original Message ----- >> From: "Dominique Belhachemi" >> To: "David Beer" , "Torque Developers mailing list" >> Sent: Thursday, January 5, 2012 2:35:45 PM >> Subject: Re: [torquedev] TORQUE 4.0 Is Officially Beta-Testing >> >> >> Could you also please clarify what part of the code is covered by >> a) the original PBS license (PBS_License.txt) and >> b) the new introduced license (PBS_License_2.5.txt) >> >> Thanks >> -Dominique >> >> >> On Wed, 4 Jan 2012, David Beer wrote: >> >>> >>> >>> ----- Original Message ----- >>>> On Fri, 23 Dec 2011 02:27:52 PM Chris Samuel wrote: >>>> >>>>> In trunk there's a file README.new_in_4.0 which doesn't appear to >>>>> be in that tarball for some reason >>>> >>>> Also missing are: >>>> >>>> README.trqauthd >>>> README.NUMA >>>> README.coding_notes >>>> PBS_License.txt >>>> >>>> and quite a few others, I've attached a diff for the files that >>>> are >>>> only in the trunk or only in the tarball (excluding the obvious >>>> autoconf/svn ones of course). >>>> >>>> cheers! >>>> Chris >>>> -- >>>> Christopher Samuel - Senior Systems Administrator >>>> VLSCI - Victorian Life Sciences Computation Initiative >>>> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 >>>> http://www.vlsci.unimelb.edu.au/ >>>> >>> >>> Sorry for the lateness of this reply. Those README files and the >>> PBS_License.txt file need to be in the distribution. We will get >>> this corrected. Looking at the diff: >>> >>> autogen.sh - this is only necessary for building from the source in >>> svn, so we didn't include it. >>> All those .h files that aren't included are mostly related to >>> performing unit tests, which we didn't think people would be doing >>> in production so they aren't included. Along the same line of >>> thinking, I should have mentioned in my original email that if you >>> configure TORQUE with --with-check it won't work unless you >>> checked things out from svn (because these are other files are >>> missing). We're not opposed to people having these files, we just >>> thought that nobody would want them who is downloading TORQUE, >>> only those who were checking out the code from svn. >>> The files that are in the distribution but not in trunk, such as >>> torque.spec and pbs_config.h.in, are files produced during the >>> build. Sometime before the 3.0 release we had decided as a group >>> that these files didn't need to be stored in svn, so that's why >>> you don't see them if you check out. >>> >>> Thanks for looking at this and checking our work. It is very >>> appreciated. >>> >>> -- >>> David Beer >>> Direct Line: 801-717-3386 | Fax: 801-717-3738 >>> Adaptive Computing >>> 1712 S East Bay Blvd, Suite 300 >>> Provo, UT 84606 >>> >>> _______________________________________________ >>> torquedev mailing list >>> torquedev at supercluster.org >>> http://www.supercluster.org/mailman/listinfo/torquedev >>> >> >> _______________________________________________ >> torquedev mailing list >> torquedev at supercluster.org >> http://www.supercluster.org/mailman/listinfo/torquedev >> > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From j.blank at fz-juelich.de Fri Jan 6 06:40:19 2012 From: j.blank at fz-juelich.de (Joerg Blank) Date: Fri, 6 Jan 2012 14:40:19 +0100 Subject: [torquedev] Torque 4 SVN and NUMA Support Message-ID: <4F06F9C3.2030105@fz-juelich.de> Hello, I'm trying to compile a recent svn checkout with NUMA support. > gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../src/include -I/usr/include/libxml2 -DPBS_SERVER_HOME=\"/var/spool/torque\" -DPBS_ENVIRON=\"/var/spool/torque/pbs_environment\" -g -O2 -D_LARGEFILE64_SOURCE -DNUMA_SUPPORT -MT req_stat.o -MD -MP -MF .deps/req_stat.Tpo -c -o req_stat.o req_stat.c > In file included from req_stat.c:118: > svr_connect.h:15: error: conflicting types for ?socket_to_handle? > ../../src/include/svrfunc.h:22: note: previous declaration of ?socket_to_handle? was here > make[3]: *** [req_stat.o] Error 1 I look into the preprocessed file and there seems to be in incorrect macro expansion of "errno" in svr_connect.h. Renaming the parameter solves this problem. The compilation now continues until: > gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../src/include -I../../src/resmom/linux -DPBS_MOM -DDEMUX=\"/usr/local/sbin/pbs_demux\" -DRCP_PATH=\"/usr/bin/scp\" -DRCP_ARGS=\"-rpB\" -DPBS_SERVER_HOME=\"/var/spool/torque\" -DPBS_ENVIRON=\"/var/spool/torque/pbs_environment\" -I/usr/include/libxml2 -g -O2 -D_LARGEFILE64_SOURCE -DNUMA_SUPPORT -MT mom_server.o -MD -MP -MF .deps/mom_server.Tpo -c -o mom_server.o mom_server.c > mom_server.c: In function ?mom_server_all_update_stat?: > mom_server.c:3329: error: ?num_numa_nodes? undeclared (first use in this function) > mom_server.c:3329: error: (Each undeclared identifier is reported only once > mom_server.c:3329: error: for each function it appears in.) > make[3]: *** [mom_server.o] Error 1 Unfortunately I have no solution for this. Regards, J?rg Blank -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://www.supercluster.org/pipermail/torquedev/attachments/20120106/6955a5aa/attachment.bin From knielson at adaptivecomputing.com Fri Jan 6 09:41:53 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Fri, 06 Jan 2012 09:41:53 -0700 (MST) Subject: [torquedev] Torque 4 SVN and NUMA Support In-Reply-To: <4F06F9C3.2030105@fz-juelich.de> Message-ID: ----- Original Message ----- > From: "Joerg Blank" > To: "Torque Developers mailing list" > Sent: Friday, January 6, 2012 6:40:19 AM > Subject: [torquedev] Torque 4 SVN and NUMA Support > > Hello, > > I'm trying to compile a recent svn checkout with NUMA support. > > > gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../src/include > > -I/usr/include/libxml2 -DPBS_SERVER_HOME=\"/var/spool/torque\" > > -DPBS_ENVIRON=\"/var/spool/torque/pbs_environment\" -g -O2 > > -D_LARGEFILE64_SOURCE -DNUMA_SUPPORT -MT req_stat.o -MD -MP -MF > > .deps/req_stat.Tpo -c -o req_stat.o req_stat.c > > In file included from req_stat.c:118: > > svr_connect.h:15: error: conflicting types for ?socket_to_handle? > > ../../src/include/svrfunc.h:22: note: previous declaration of > > ?socket_to_handle? was here > > make[3]: *** [req_stat.o] Error 1 > > I look into the preprocessed file and there seems to be in incorrect > macro expansion of "errno" in svr_connect.h. Renaming the parameter > solves this problem. > > The compilation now continues until: > > > gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../src/include > > -I../../src/resmom/linux -DPBS_MOM > > -DDEMUX=\"/usr/local/sbin/pbs_demux\" -DRCP_PATH=\"/usr/bin/scp\" > > -DRCP_ARGS=\"-rpB\" -DPBS_SERVER_HOME=\"/var/spool/torque\" > > -DPBS_ENVIRON=\"/var/spool/torque/pbs_environment\" > > -I/usr/include/libxml2 -g -O2 -D_LARGEFILE64_SOURCE > > -DNUMA_SUPPORT -MT mom_server.o -MD -MP -MF .deps/mom_server.Tpo > > -c -o mom_server.o mom_server.c > > mom_server.c: In function ?mom_server_all_update_stat?: > > mom_server.c:3329: error: ?num_numa_nodes? undeclared (first use in > > this function) > > mom_server.c:3329: error: (Each undeclared identifier is reported > > only once > > mom_server.c:3329: error: for each function it appears in.) > > make[3]: *** [mom_server.o] Error 1 > > Unfortunately I have no solution for this. > > Regards, > J?rg Blank > > J?rg, You may need to rerun autogen.sh and then rerun configure. Ken From dbeer at adaptivecomputing.com Fri Jan 6 10:06:40 2012 From: dbeer at adaptivecomputing.com (David Beer) Date: Fri, 06 Jan 2012 10:06:40 -0700 (MST) Subject: [torquedev] Torque 4 SVN and NUMA Support In-Reply-To: <4F06F9C3.2030105@fz-juelich.de> Message-ID: <64e7701c-7f88-446a-bd37-426d66beeff9@mail> ----- Original Message ----- > Hello, > > I'm trying to compile a recent svn checkout with NUMA support. > > > gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../src/include > > -I/usr/include/libxml2 -DPBS_SERVER_HOME=\"/var/spool/torque\" > > -DPBS_ENVIRON=\"/var/spool/torque/pbs_environment\" -g -O2 > > -D_LARGEFILE64_SOURCE -DNUMA_SUPPORT -MT req_stat.o -MD -MP -MF > > .deps/req_stat.Tpo -c -o req_stat.o req_stat.c > > In file included from req_stat.c:118: > > svr_connect.h:15: error: conflicting types for ?socket_to_handle? > > ../../src/include/svrfunc.h:22: note: previous declaration of > > ?socket_to_handle? was here > > make[3]: *** [req_stat.o] Error 1 > > I look into the preprocessed file and there seems to be in incorrect > macro expansion of "errno" in svr_connect.h. Renaming the parameter > solves this problem. > > The compilation now continues until: > > > gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../src/include > > -I../../src/resmom/linux -DPBS_MOM > > -DDEMUX=\"/usr/local/sbin/pbs_demux\" -DRCP_PATH=\"/usr/bin/scp\" > > -DRCP_ARGS=\"-rpB\" -DPBS_SERVER_HOME=\"/var/spool/torque\" > > -DPBS_ENVIRON=\"/var/spool/torque/pbs_environment\" > > -I/usr/include/libxml2 -g -O2 -D_LARGEFILE64_SOURCE > > -DNUMA_SUPPORT -MT mom_server.o -MD -MP -MF .deps/mom_server.Tpo > > -c -o mom_server.o mom_server.c > > mom_server.c: In function ?mom_server_all_update_stat?: > > mom_server.c:3329: error: ?num_numa_nodes? undeclared (first use in > > this function) > > mom_server.c:3329: error: (Each undeclared identifier is reported > > only once > > mom_server.c:3329: error: for each function it appears in.) > > make[3]: *** [mom_server.o] Error 1 > > Unfortunately I have no solution for this. > > Regards, > J?rg Blank > Jorg, It appears there were some changes made that broke the compilation of NUMA. I will get you a new build ASAP. -- David Beer Direct Line: 801-717-3386 | Fax: 801-717-3738 Adaptive Computing 1712 S East Bay Blvd, Suite 300 Provo, UT 84606 From dbeer at adaptivecomputing.com Fri Jan 6 11:19:45 2012 From: dbeer at adaptivecomputing.com (David Beer) Date: Fri, 06 Jan 2012 11:19:45 -0700 (MST) Subject: [torquedev] Beta Update In-Reply-To: <0c6c7461-2773-4862-bb64-002d3290864e@mail> Message-ID: <7a2cce2c-fe85-4387-8f4e-89a09036a562@mail> All, We have an updated beta snapshot for 4.0.0. Fixes from the original include: - compiling with numa enabled - having the README files that were missing - only having one license included - fixing some warnings with munge enabled - fixing a deadlock if you're using job logging - adding some more buffer protections (solving some of the high load crashes) - add some error checking if there are disagreements between the mom_hierarchy and nodes files (nodes present in one but not the other) It can be downloaded here: http://www.adaptivecomputing.com/resources/downloads/torque/4.0-beta/torque-4.0.0-snap.201201061112.tar.gz -- David Beer Direct Line: 801-717-3386 | Fax: 801-717-3738 Adaptive Computing 1712 S East Bay Blvd, Suite 300 Provo, UT 84606 From domibel at cs.tu-berlin.de Fri Jan 6 11:48:11 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Fri, 6 Jan 2012 19:48:11 +0100 (CET) Subject: [torquedev] Beta Update In-Reply-To: <7a2cce2c-fe85-4387-8f4e-89a09036a562@mail> References: <7a2cce2c-fe85-4387-8f4e-89a09036a562@mail> Message-ID: There are still two license files included: torque-4.0.0/contrib/PBS_License_2.3.txt torque-4.0.0/PBS_License.txt Please revert all the license changes you made so far. Customers might loose the right to use torque if they violate the original license. The same applies to 2.5.x and 3.x versions. Thanks -Dominique On Fri, 6 Jan 2012, David Beer wrote: > All, > > We have an updated beta snapshot for 4.0.0. Fixes from the original include: > > - compiling with numa enabled > - having the README files that were missing > - only having one license included > - fixing some warnings with munge enabled > - fixing a deadlock if you're using job logging > - adding some more buffer protections (solving some of the high load crashes) > - add some error checking if there are disagreements between the mom_hierarchy and nodes files (nodes present in one but not the other) > > It can be downloaded here: http://www.adaptivecomputing.com/resources/downloads/torque/4.0-beta/torque-4.0.0-snap.201201061112.tar.gz > > -- > David Beer > Direct Line: 801-717-3386 | Fax: 801-717-3738 > Adaptive Computing > 1712 S East Bay Blvd, Suite 300 > Provo, UT 84606 > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From domibel at cs.tu-berlin.de Fri Jan 6 11:48:11 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Fri, 6 Jan 2012 19:48:11 +0100 (CET) Subject: [torquedev] Beta Update In-Reply-To: <7a2cce2c-fe85-4387-8f4e-89a09036a562@mail> References: <7a2cce2c-fe85-4387-8f4e-89a09036a562@mail> Message-ID: There are still two license files included: torque-4.0.0/contrib/PBS_License_2.3.txt torque-4.0.0/PBS_License.txt Please revert all the license changes you made so far. Customers might loose the right to use torque if they violate the original license. The same applies to 2.5.x and 3.x versions. Thanks -Dominique On Fri, 6 Jan 2012, David Beer wrote: > All, > > We have an updated beta snapshot for 4.0.0. Fixes from the original include: > > - compiling with numa enabled > - having the README files that were missing > - only having one license included > - fixing some warnings with munge enabled > - fixing a deadlock if you're using job logging > - adding some more buffer protections (solving some of the high load crashes) > - add some error checking if there are disagreements between the mom_hierarchy and nodes files (nodes present in one but not the other) > > It can be downloaded here: http://www.adaptivecomputing.com/resources/downloads/torque/4.0-beta/torque-4.0.0-snap.201201061112.tar.gz > > -- > David Beer > Direct Line: 801-717-3386 | Fax: 801-717-3738 > Adaptive Computing > 1712 S East Bay Blvd, Suite 300 > Provo, UT 84606 > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From knielson at adaptivecomputing.com Fri Jan 6 13:39:04 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Fri, 06 Jan 2012 13:39:04 -0700 (MST) Subject: [torquedev] Beta Update In-Reply-To: Message-ID: <95126526-64f1-4689-82b6-33120058cf9b@mail> Dominique, Our attorney is the one who edited this license. Again the only things that have changed are the removal of two expired provisions which prevented the commercial distribution of TORQUE prior to December 31, 2001 and an update to the the contact information. Users are not in jeopardy of violating the PBS License agreement because of these clarifications. In spite of the editing work done the license has not changed. Ken Nielson Adaptive Computing ----- Original Message ----- > From: "Dominique Belhachemi" > To: "David Beer" , "Torque Developers mailing list" > Cc: torquedev at adaptivecomputing.com, torqueusers at adaptivecomputing.com > Sent: Friday, January 6, 2012 11:48:11 AM > Subject: Re: [torquedev] Beta Update > > There are still two license files included: > > torque-4.0.0/contrib/PBS_License_2.3.txt > torque-4.0.0/PBS_License.txt > > Please revert all the license changes you made so far. Customers > might > loose the right to use torque if they violate the original license. > > The same applies to 2.5.x and 3.x versions. > > Thanks > -Dominique > > > On Fri, 6 Jan 2012, David Beer wrote: > > > All, > > > > We have an updated beta snapshot for 4.0.0. Fixes from the original > > include: > > > > - compiling with numa enabled > > - having the README files that were missing > > - only having one license included > > - fixing some warnings with munge enabled > > - fixing a deadlock if you're using job logging > > - adding some more buffer protections (solving some of the high > > load crashes) > > - add some error checking if there are disagreements between the > > mom_hierarchy and nodes files (nodes present in one but not the > > other) > > > > It can be downloaded here: > > http://www.adaptivecomputing.com/resources/downloads/torque/4.0-beta/torque-4.0.0-snap.201201061112.tar.gz > > > > -- > > David Beer > > Direct Line: 801-717-3386 | Fax: 801-717-3738 > > Adaptive Computing > > 1712 S East Bay Blvd, Suite 300 > > Provo, UT 84606 > > > > _______________________________________________ > > torquedev mailing list > > torquedev at supercluster.org > > http://www.supercluster.org/mailman/listinfo/torquedev > > > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From knielson at adaptivecomputing.com Fri Jan 6 13:39:04 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Fri, 06 Jan 2012 13:39:04 -0700 (MST) Subject: [torquedev] Beta Update In-Reply-To: Message-ID: <95126526-64f1-4689-82b6-33120058cf9b@mail> Dominique, Our attorney is the one who edited this license. Again the only things that have changed are the removal of two expired provisions which prevented the commercial distribution of TORQUE prior to December 31, 2001 and an update to the the contact information. Users are not in jeopardy of violating the PBS License agreement because of these clarifications. In spite of the editing work done the license has not changed. Ken Nielson Adaptive Computing ----- Original Message ----- > From: "Dominique Belhachemi" > To: "David Beer" , "Torque Developers mailing list" > Cc: torquedev at adaptivecomputing.com, torqueusers at adaptivecomputing.com > Sent: Friday, January 6, 2012 11:48:11 AM > Subject: Re: [torquedev] Beta Update > > There are still two license files included: > > torque-4.0.0/contrib/PBS_License_2.3.txt > torque-4.0.0/PBS_License.txt > > Please revert all the license changes you made so far. Customers > might > loose the right to use torque if they violate the original license. > > The same applies to 2.5.x and 3.x versions. > > Thanks > -Dominique > > > On Fri, 6 Jan 2012, David Beer wrote: > > > All, > > > > We have an updated beta snapshot for 4.0.0. Fixes from the original > > include: > > > > - compiling with numa enabled > > - having the README files that were missing > > - only having one license included > > - fixing some warnings with munge enabled > > - fixing a deadlock if you're using job logging > > - adding some more buffer protections (solving some of the high > > load crashes) > > - add some error checking if there are disagreements between the > > mom_hierarchy and nodes files (nodes present in one but not the > > other) > > > > It can be downloaded here: > > http://www.adaptivecomputing.com/resources/downloads/torque/4.0-beta/torque-4.0.0-snap.201201061112.tar.gz > > > > -- > > David Beer > > Direct Line: 801-717-3386 | Fax: 801-717-3738 > > Adaptive Computing > > 1712 S East Bay Blvd, Suite 300 > > Provo, UT 84606 > > > > _______________________________________________ > > torquedev mailing list > > torquedev at supercluster.org > > http://www.supercluster.org/mailman/listinfo/torquedev > > > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From domibel at cs.tu-berlin.de Thu Jan 12 08:12:58 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Thu, 12 Jan 2012 16:12:58 +0100 (CET) Subject: [torquedev] Beta Update In-Reply-To: <95126526-64f1-4689-82b6-33120058cf9b@mail> References: <95126526-64f1-4689-82b6-33120058cf9b@mail> Message-ID: Ken, Who owns what copyright? Why is the changed PBS license mentioning MOAB? Thanks -Dominique On Fri, 6 Jan 2012, Ken Nielson wrote: > Dominique, > > Our attorney is the one who edited this license. Again the only things that have changed are the removal of two expired provisions which prevented the commercial distribution of TORQUE prior to December 31, 2001 and an update to the the contact information. > > Users are not in jeopardy of violating the PBS License agreement because of these clarifications. In spite of the editing work done the license has not changed. > > Ken Nielson > Adaptive Computing > > ----- Original Message ----- >> From: "Dominique Belhachemi" >> To: "David Beer" , "Torque Developers mailing list" >> Cc: torquedev at adaptivecomputing.com, torqueusers at adaptivecomputing.com >> Sent: Friday, January 6, 2012 11:48:11 AM >> Subject: Re: [torquedev] Beta Update >> >> There are still two license files included: >> >> torque-4.0.0/contrib/PBS_License_2.3.txt >> torque-4.0.0/PBS_License.txt >> >> Please revert all the license changes you made so far. Customers >> might >> loose the right to use torque if they violate the original license. >> >> The same applies to 2.5.x and 3.x versions. >> >> Thanks >> -Dominique >> From domibel at cs.tu-berlin.de Thu Jan 12 08:12:58 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Thu, 12 Jan 2012 16:12:58 +0100 (CET) Subject: [torquedev] Beta Update In-Reply-To: <95126526-64f1-4689-82b6-33120058cf9b@mail> References: <95126526-64f1-4689-82b6-33120058cf9b@mail> Message-ID: Ken, Who owns what copyright? Why is the changed PBS license mentioning MOAB? Thanks -Dominique On Fri, 6 Jan 2012, Ken Nielson wrote: > Dominique, > > Our attorney is the one who edited this license. Again the only things that have changed are the removal of two expired provisions which prevented the commercial distribution of TORQUE prior to December 31, 2001 and an update to the the contact information. > > Users are not in jeopardy of violating the PBS License agreement because of these clarifications. In spite of the editing work done the license has not changed. > > Ken Nielson > Adaptive Computing > > ----- Original Message ----- >> From: "Dominique Belhachemi" >> To: "David Beer" , "Torque Developers mailing list" >> Cc: torquedev at adaptivecomputing.com, torqueusers at adaptivecomputing.com >> Sent: Friday, January 6, 2012 11:48:11 AM >> Subject: Re: [torquedev] Beta Update >> >> There are still two license files included: >> >> torque-4.0.0/contrib/PBS_License_2.3.txt >> torque-4.0.0/PBS_License.txt >> >> Please revert all the license changes you made so far. Customers >> might >> loose the right to use torque if they violate the original license. >> >> The same applies to 2.5.x and 3.x versions. >> >> Thanks >> -Dominique >> From knielson at adaptivecomputing.com Thu Jan 12 09:11:23 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 12 Jan 2012 09:11:23 -0700 (MST) Subject: [torquedev] Beta Update In-Reply-To: Message-ID: <1e924e01-b7c6-4a27-8d6c-243b3dff1b4a@mail> Dominique, The copyright at the top of the License is for the document itself. It does not apply to the code. This copyright of the document is owned by Adaptive Computing Enterprises, Inc. Moab is mentioned as a courtesy to users of TORQUE to let Moab users know they have an alternative resource for support other than the mailing list. This is similarly done in the previous license document provided by Veridian Information Solutions, Inc. As to who owns any other copyrights I will need to get clarification. I will let you know what I find out from our legal counsel. Ken ----- Original Message ----- > From: "Dominique Belhachemi" > To: "Torque Developers mailing list" > Cc: torquedev at adaptivecomputing.com, torqueusers at adaptivecomputing.com > Sent: Thursday, January 12, 2012 8:12:58 AM > Subject: Re: [torquedev] Beta Update > > > Ken, > > Who owns what copyright? > > Why is the changed PBS license mentioning MOAB? > > Thanks > -Dominique > > > On Fri, 6 Jan 2012, Ken Nielson wrote: > > > Dominique, > > > > Our attorney is the one who edited this license. Again the only > > things that have changed are the removal of two expired provisions > > which prevented the commercial distribution of TORQUE prior to > > December 31, 2001 and an update to the the contact information. > > > > Users are not in jeopardy of violating the PBS License agreement > > because of these clarifications. In spite of the editing work done > > the license has not changed. > > > > Ken Nielson > > Adaptive Computing > > > > ----- Original Message ----- > >> From: "Dominique Belhachemi" > >> To: "David Beer" , "Torque Developers > >> mailing list" > >> Cc: torquedev at adaptivecomputing.com, > >> torqueusers at adaptivecomputing.com > >> Sent: Friday, January 6, 2012 11:48:11 AM > >> Subject: Re: [torquedev] Beta Update > >> > >> There are still two license files included: > >> > >> torque-4.0.0/contrib/PBS_License_2.3.txt > >> torque-4.0.0/PBS_License.txt > >> > >> Please revert all the license changes you made so far. Customers > >> might > >> loose the right to use torque if they violate the original > >> license. > >> > >> The same applies to 2.5.x and 3.x versions. > >> > >> Thanks > >> -Dominique > >> > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From samuel at unimelb.edu.au Thu Jan 12 15:37:56 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 13 Jan 2012 09:37:56 +1100 Subject: [torquedev] Beta Update In-Reply-To: References: <95126526-64f1-4689-82b6-33120058cf9b@mail> Message-ID: <4F0F60C4.7030809@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/01/12 02:12, Dominique Belhachemi wrote: > Who owns what copyright? Who knows - the code has a long history of over two decades [1] now and I doubt if anyone ever tracked who contributed what lines after the various maintainers started accepting patches from the community, at least up until it started to be kept in a version control system in the mid 2000's. For instance, I've never assigned copyright over any of my patches I've submitted, so I (or possibly my employer of the time) would hold copyright over those parts. [1] - http://en.wikipedia.org/wiki/Portable_Batch_System cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8PYMQACgkQO2KABBYQAh/UjQCfZlcI2spRgAQWMnlHPKr9jEpf c3cAn1Y5bWiwXlQfi8wZvWmasbHPBI0d =Ns1I -----END PGP SIGNATURE----- From Gareth.Williams at csiro.au Sun Jan 15 20:19:13 2012 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Mon, 16 Jan 2012 14:19:13 +1100 Subject: [torquedev] job logging showjobs vmem patch Message-ID: <007DECE986B47F4EABF823C1FBB19C620102C8732EDE@exvic-mbx04.nexus.csiro.au> Hi Ken and David, I've just started using the new job logging feature: http://www.adaptivecomputing.com/resources/docs/torque/2-5-9/10.1joblogging.php And found I needed to patch the showjobs script (from Adaptive) if I wanted it to report vmem usage. Can you incorporate the patch please? Feel free to choose a better term than 'Vmemory Used'. Regards, Gareth --- ./torque-3.0.4-snap.201201051014/contrib/showjobs 2010-12-07 09:44:20.000000000 +1100 +++ /usr/local/torque/bin/showjobs 2012-01-16 13:58:43.000000000 +1100 @@ -182,6 +182,10 @@ { $jobAttr{'Memory Used'} = $1; } + elsif ($line =~ /([^<]+)([^<]+) References: <007DECE986B47F4EABF823C1FBB19C620102C8732EDE@exvic-mbx04.nexus.csiro.au> Message-ID: <007DECE986B47F4EABF823C1FBB19C620102C8732EDF@exvic-mbx04.nexus.csiro.au> I've just noticed that array jobs are not particularly well handled by the showjobs script: wil240 at cheraxdev:~> showjobs -j 403[4] wil240 at cheraxdev:~> showjobs -j 403[] Unmatched [ in regex; marked by <-- HERE in m/^403[ <-- HERE ]\b/ at /usr/local/torque/bin/showjobs line 266, line 332 I'd be willing to work on a fix if no one else volunteers (and if anyone cares). The log content seems fine and the following works: showjobs 403 Gareth > -----Original Message----- > From: torquedev-bounces at supercluster.org [mailto:torquedev- > bounces at supercluster.org] On Behalf Of Gareth.Williams at csiro.au > Sent: Monday, 16 January 2012 2:19 PM > To: torquedev at supercluster.org > Subject: [ExternalEmail] [torquedev] job logging showjobs vmem patch > > Hi Ken and David, > > I've just started using the new job logging feature: > http://www.adaptivecomputing.com/resources/docs/torque/2-5- > 9/10.1joblogging.php > And found I needed to patch the showjobs script (from Adaptive) if I > wanted it to report vmem usage. Can you incorporate the patch please? > Feel free to choose a better term than 'Vmemory Used'. > > Regards, > > Gareth > > --- ./torque-3.0.4-snap.201201051014/contrib/showjobs 2010-12-07 > 09:44:20.000000000 +1100 > +++ /usr/local/torque/bin/showjobs 2012-01-16 13:58:43.000000000 > +1100 > @@ -182,6 +182,10 @@ > { > $jobAttr{'Memory Used'} = $1; > } > + elsif ($line =~ /([^<]+) 'resources_used') > + { > + $jobAttr{'Vmemory Used'} = $1; > + } > elsif ($line =~ /([^<]+) 'resource_List') > { > $jobAttr{'Operating System'} = $1; > @@ -317,6 +321,7 @@ > 'Wallclock Duration', > 'CPUTime', > 'Memory Used', > + 'Vmemory Used', > 'Submit Time', > 'Start Time', > 'End Time', > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev From knielson at adaptivecomputing.com Mon Jan 16 10:35:40 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Mon, 16 Jan 2012 10:35:40 -0700 (MST) Subject: [torquedev] job logging showjobs vmem patch In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102C8732EDE@exvic-mbx04.nexus.csiro.au> Message-ID: Gareth, Thanks for the patch. I am testing it right now. Ken ----- Original Message ----- > From: "Gareth Williams" > To: torquedev at supercluster.org > Sent: Sunday, January 15, 2012 8:19:13 PM > Subject: [torquedev] job logging showjobs vmem patch > > Hi Ken and David, > > I've just started using the new job logging feature: > http://www.adaptivecomputing.com/resources/docs/torque/2-5-9/10.1joblogging.php > And found I needed to patch the showjobs script (from Adaptive) if I > wanted it to report vmem usage. Can you incorporate the patch > please? Feel free to choose a better term than 'Vmemory Used'. > > Regards, > > Gareth > > --- ./torque-3.0.4-snap.201201051014/contrib/showjobs 2010-12-07 > 09:44:20.000000000 +1100 > +++ /usr/local/torque/bin/showjobs 2012-01-16 13:58:43.000000000 > +1100 > @@ -182,6 +182,10 @@ > { > $jobAttr{'Memory Used'} = $1; > } > + elsif ($line =~ /([^<]+) 'resources_used') > + { > + $jobAttr{'Vmemory Used'} = $1; > + } > elsif ($line =~ /([^<]+) 'resource_List') > { > $jobAttr{'Operating System'} = $1; > @@ -317,6 +321,7 @@ > 'Wallclock Duration', > 'CPUTime', > 'Memory Used', > + 'Vmemory Used', > 'Submit Time', > 'Start Time', > 'End Time', > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From knielson at adaptivecomputing.com Mon Jan 16 10:42:25 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Mon, 16 Jan 2012 10:42:25 -0700 (MST) Subject: [torquedev] job logging showjobs vmem patch In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102C8732EDF@exvic-mbx04.nexus.csiro.au> Message-ID: <89c4c94b-9fff-4be9-8fb8-351504733551@mail> Gareth, I see what you mean. If you have time and are willing to fix this for array jobs we would be very happy. Ken ----- Original Message ----- > From: "Gareth Williams" > To: torquedev at supercluster.org > Sent: Sunday, January 15, 2012 9:17:31 PM > Subject: Re: [torquedev] job logging showjobs vmem patch > > I've just noticed that array jobs are not particularly well handled > by the showjobs script: > wil240 at cheraxdev:~> showjobs -j 403[4] > wil240 at cheraxdev:~> showjobs -j 403[] > Unmatched [ in regex; marked by <-- HERE in m/^403[ <-- HERE ]\b/ at > /usr/local/torque/bin/showjobs line 266, line 332 > > I'd be willing to work on a fix if no one else volunteers (and if > anyone cares). > > The log content seems fine and the following works: > showjobs 403 > > Gareth > > > -----Original Message----- > > From: torquedev-bounces at supercluster.org [mailto:torquedev- > > bounces at supercluster.org] On Behalf Of Gareth.Williams at csiro.au > > Sent: Monday, 16 January 2012 2:19 PM > > To: torquedev at supercluster.org > > Subject: [ExternalEmail] [torquedev] job logging showjobs vmem > > patch > > > > Hi Ken and David, > > > > I've just started using the new job logging feature: > > http://www.adaptivecomputing.com/resources/docs/torque/2-5- > > 9/10.1joblogging.php > > And found I needed to patch the showjobs script (from Adaptive) if > > I > > wanted it to report vmem usage. Can you incorporate the patch > > please? > > Feel free to choose a better term than 'Vmemory Used'. > > > > Regards, > > > > Gareth > > > > --- ./torque-3.0.4-snap.201201051014/contrib/showjobs 2010-12-07 > > 09:44:20.000000000 +1100 > > +++ /usr/local/torque/bin/showjobs 2012-01-16 > > 13:58:43.000000000 > > +1100 > > @@ -182,6 +182,10 @@ > > { > > $jobAttr{'Memory Used'} = $1; > > } > > + elsif ($line =~ /([^<]+) > 'resources_used') > > + { > > + $jobAttr{'Vmemory Used'} = $1; > > + } > > elsif ($line =~ /([^<]+) > 'resource_List') > > { > > $jobAttr{'Operating System'} = $1; > > @@ -317,6 +321,7 @@ > > 'Wallclock Duration', > > 'CPUTime', > > 'Memory Used', > > + 'Vmemory Used', > > 'Submit Time', > > 'Start Time', > > 'End Time', > > _______________________________________________ > > torquedev mailing list > > torquedev at supercluster.org > > http://www.supercluster.org/mailman/listinfo/torquedev > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From knielson at adaptivecomputing.com Mon Jan 16 11:04:38 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Mon, 16 Jan 2012 11:04:38 -0700 (MST) Subject: [torquedev] job logging showjobs vmem patch In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102C8732EDE@exvic-mbx04.nexus.csiro.au> Message-ID: <8aab2023-f062-436f-843b-51306c6d92fd@mail> Gareth, Gareth, Your patch has been checked in and will be available in 2.5.10 and 3.0.4. I changed Vmemory Used to vmem Used. Ken ----- Original Message ----- > From: "Gareth Williams" > To: torquedev at supercluster.org > Sent: Sunday, January 15, 2012 8:19:13 PM > Subject: [torquedev] job logging showjobs vmem patch > > Hi Ken and David, > > I've just started using the new job logging feature: > http://www.adaptivecomputing.com/resources/docs/torque/2-5-9/10.1joblogging.php > And found I needed to patch the showjobs script (from Adaptive) if I > wanted it to report vmem usage. Can you incorporate the patch > please? Feel free to choose a better term than 'Vmemory Used'. > > Regards, > > Gareth > > --- ./torque-3.0.4-snap.201201051014/contrib/showjobs 2010-12-07 > 09:44:20.000000000 +1100 > +++ /usr/local/torque/bin/showjobs 2012-01-16 13:58:43.000000000 > +1100 > @@ -182,6 +182,10 @@ > { > $jobAttr{'Memory Used'} = $1; > } > + elsif ($line =~ /([^<]+) 'resources_used') > + { > + $jobAttr{'Vmemory Used'} = $1; > + } > elsif ($line =~ /([^<]+) 'resource_List') > { > $jobAttr{'Operating System'} = $1; > @@ -317,6 +321,7 @@ > 'Wallclock Duration', > 'CPUTime', > 'Memory Used', > + 'Vmemory Used', > 'Submit Time', > 'Start Time', > 'End Time', > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From domibel at cs.tu-berlin.de Mon Jan 16 11:30:08 2012 From: domibel at cs.tu-berlin.de (Dominique Belhachemi) Date: Mon, 16 Jan 2012 19:30:08 +0100 (CET) Subject: [torquedev] Beta Update In-Reply-To: <1e924e01-b7c6-4a27-8d6c-243b3dff1b4a@mail> References: <1e924e01-b7c6-4a27-8d6c-243b3dff1b4a@mail> Message-ID: Ken, Did you receive any news from your legal counsel? He should also double-check the copyright violation of the PBS license itself. You lawyer is not allowed to remove/replace the copyright notice in the license file. http://www.copyright.gov/title17/92chap5.html -Dominique On Thu, 12 Jan 2012, Ken Nielson wrote: > Dominique, > > The copyright at the top of the License is for the document itself. It does not apply to the code. This copyright of the document is owned by Adaptive Computing Enterprises, Inc. Moab is mentioned as a courtesy to users of TORQUE to let Moab users know they have an alternative resource for support other than the mailing list. This is similarly done in the previous license document provided by Veridian Information Solutions, Inc. > > As to who owns any other copyrights I will need to get clarification. I will let you know what I find out from our legal counsel. > > Ken > > ----- Original Message ----- >> From: "Dominique Belhachemi" >> To: "Torque Developers mailing list" >> Cc: torquedev at adaptivecomputing.com, torqueusers at adaptivecomputing.com >> Sent: Thursday, January 12, 2012 8:12:58 AM >> Subject: Re: [torquedev] Beta Update >> >> >> Ken, >> >> Who owns what copyright? >> >> Why is the changed PBS license mentioning MOAB? >> >> Thanks >> -Dominique >> >> >> On Fri, 6 Jan 2012, Ken Nielson wrote: >> >>> Dominique, >>> >>> Our attorney is the one who edited this license. Again the only >>> things that have changed are the removal of two expired provisions >>> which prevented the commercial distribution of TORQUE prior to >>> December 31, 2001 and an update to the the contact information. >>> >>> Users are not in jeopardy of violating the PBS License agreement >>> because of these clarifications. In spite of the editing work done >>> the license has not changed. >>> >>> Ken Nielson >>> Adaptive Computing >>> >>> ----- Original Message ----- >>>> From: "Dominique Belhachemi" >>>> To: "David Beer" , "Torque Developers >>>> mailing list" >>>> Cc: torquedev at adaptivecomputing.com, >>>> torqueusers at adaptivecomputing.com >>>> Sent: Friday, January 6, 2012 11:48:11 AM >>>> Subject: Re: [torquedev] Beta Update >>>> >>>> There are still two license files included: >>>> >>>> torque-4.0.0/contrib/PBS_License_2.3.txt >>>> torque-4.0.0/PBS_License.txt >>>> >>>> Please revert all the license changes you made so far. Customers >>>> might >>>> loose the right to use torque if they violate the original >>>> license. >>>> >>>> The same applies to 2.5.x and 3.x versions. >>>> >>>> Thanks >>>> -Dominique >>>> >> _______________________________________________ >> torquedev mailing list >> torquedev at supercluster.org >> http://www.supercluster.org/mailman/listinfo/torquedev >> > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From knielson at adaptivecomputing.com Mon Jan 16 11:38:54 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Mon, 16 Jan 2012 11:38:54 -0700 (MST) Subject: [torquedev] Beta Update In-Reply-To: Message-ID: <34031961-756d-49f6-8bfc-ea9f15f3b49d@mail> Dominique, We are still waiting to hear from counsel. In the mean time the intent of the license is to make torque open and free. Ken ----- Original Message ----- > From: "Dominique Belhachemi" > To: "Torque Developers mailing list" > Sent: Monday, January 16, 2012 11:30:08 AM > Subject: Re: [torquedev] Beta Update > > Ken, > > Did you receive any news from your legal counsel? > > He should also double-check the copyright violation of the PBS > license > itself. You lawyer is not allowed to remove/replace the copyright > notice > in the license file. > > http://www.copyright.gov/title17/92chap5.html > > -Dominique > > > On Thu, 12 Jan 2012, Ken Nielson wrote: > > > Dominique, > > > > The copyright at the top of the License is for the document itself. > > It does not apply to the code. This copyright of the document is > > owned by Adaptive Computing Enterprises, Inc. Moab is mentioned as > > a courtesy to users of TORQUE to let Moab users know they have an > > alternative resource for support other than the mailing list. This > > is similarly done in the previous license document provided by > > Veridian Information Solutions, Inc. > > > > As to who owns any other copyrights I will need to get > > clarification. I will let you know what I find out from our legal > > counsel. > > > > Ken > > > > ----- Original Message ----- > >> From: "Dominique Belhachemi" > >> To: "Torque Developers mailing list" > >> Cc: torquedev at adaptivecomputing.com, > >> torqueusers at adaptivecomputing.com > >> Sent: Thursday, January 12, 2012 8:12:58 AM > >> Subject: Re: [torquedev] Beta Update > >> > >> > >> Ken, > >> > >> Who owns what copyright? > >> > >> Why is the changed PBS license mentioning MOAB? > >> > >> Thanks > >> -Dominique > >> > >> > >> On Fri, 6 Jan 2012, Ken Nielson wrote: > >> > >>> Dominique, > >>> > >>> Our attorney is the one who edited this license. Again the only > >>> things that have changed are the removal of two expired > >>> provisions > >>> which prevented the commercial distribution of TORQUE prior to > >>> December 31, 2001 and an update to the the contact information. > >>> > >>> Users are not in jeopardy of violating the PBS License agreement > >>> because of these clarifications. In spite of the editing work > >>> done > >>> the license has not changed. > >>> > >>> Ken Nielson > >>> Adaptive Computing > >>> > >>> ----- Original Message ----- > >>>> From: "Dominique Belhachemi" > >>>> To: "David Beer" , "Torque > >>>> Developers > >>>> mailing list" > >>>> Cc: torquedev at adaptivecomputing.com, > >>>> torqueusers at adaptivecomputing.com > >>>> Sent: Friday, January 6, 2012 11:48:11 AM > >>>> Subject: Re: [torquedev] Beta Update > >>>> > >>>> There are still two license files included: > >>>> > >>>> torque-4.0.0/contrib/PBS_License_2.3.txt > >>>> torque-4.0.0/PBS_License.txt > >>>> > >>>> Please revert all the license changes you made so far. Customers > >>>> might > >>>> loose the right to use torque if they violate the original > >>>> license. > >>>> > >>>> The same applies to 2.5.x and 3.x versions. > >>>> > >>>> Thanks > >>>> -Dominique > >>>> > >> _______________________________________________ > >> torquedev mailing list > >> torquedev at supercluster.org > >> http://www.supercluster.org/mailman/listinfo/torquedev > >> > > _______________________________________________ > > torquedev mailing list > > torquedev at supercluster.org > > http://www.supercluster.org/mailman/listinfo/torquedev > > > > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev > From Gareth.Williams at csiro.au Wed Jan 18 22:01:54 2012 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Thu, 19 Jan 2012 16:01:54 +1100 Subject: [torquedev] job logging showjobs vmem patch In-Reply-To: <89c4c94b-9fff-4be9-8fb8-351504733551@mail> References: <007DECE986B47F4EABF823C1FBB19C620102C8732EDF@exvic-mbx04.nexus.csiro.au> <89c4c94b-9fff-4be9-8fb8-351504733551@mail> Message-ID: <007DECE986B47F4EABF823C1FBB19C620102C8732EEA@exvic-mbx04.nexus.csiro.au> > -----Original Message----- > From: torquedev-bounces at supercluster.org [mailto:torquedev- > bounces at supercluster.org] On Behalf Of Ken Nielson > Sent: Tuesday, 17 January 2012 4:42 AM > To: Torque Developers mailing list > Subject: Re: [torquedev] job logging showjobs vmem patch > > Gareth, > > I see what you mean. If you have time and are willing to fix this for > array jobs we would be very happy. > > Ken > > ----- Original Message ----- > > From: "Gareth Williams" > > To: torquedev at supercluster.org > > Sent: Sunday, January 15, 2012 9:17:31 PM > > Subject: Re: [torquedev] job logging showjobs vmem patch > > > > I've just noticed that array jobs are not particularly well handled > > by the showjobs script: -snip- Here's a patch for showjobs to handle array jobs better. There is a second path to express memory in mebibytes or gibibytes if the values are big enough (someone local asked for that - and it is meant to be human readable). I noticed that there is a job record for the whole array which has some value but it contains Output and Error file fields which (currently) no such files are created. Gareth --- /apps/torque/bin/showjobs 2012-01-16 19:37:43.336090321 +1100 +++ showjobs 2012-01-19 14:42:48.032725413 +1100 @@ -73,6 +73,11 @@ } } +# treat brackets as literal in job id for array jobs +$specifiedJobId =~ s/\[/\\\[/g; +$specifiedJobId =~ s/\]/\\\]/g; + + # Build a sorted list of job files chdir("${torqueHomeDir}/job_logs") or die @@ -265,7 +270,7 @@ if ( defined $specifiedJobId && (! defined $jobAttr{'Job Id'} - || $jobAttr{'Job Id'} !~ /^${specifiedJobId}\b/) + || $jobAttr{'Job Id'} !~ /^${specifiedJobId}(?:.|\b)/) ); next if ( --- showjobs 2012-01-19 15:57:42.517901727 +1100 +++ showjobs.mgb 2012-01-19 15:46:36.319334573 +1100 @@ -185,11 +185,11 @@ } elsif ($line =~ /([^<]+)([^<]+)([^<]+) 1024) { + my $M = $k / 1024; + if ($M > 1024) { + my $G = $M / 1024; + $G = sprintf("%.1f", $G); + $value =~ s/${k}k/${G}G/; + } else { + $M = sprintf("%.1f", $M); + $value =~ s/${k}k/${M}M/; + } + } + } + return $value; +} ############################################################################## __END__ From samuel at unimelb.edu.au Thu Jan 19 19:27:20 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 20 Jan 2012 13:27:20 +1100 Subject: [torquedev] Torque ncpus command In-Reply-To: References: Message-ID: <4F18D108.9000300@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Billy, On 11/10/11 22:34, Lenox, Billy AMRDEC/Sentient Corp. wrote: > Well I do not have the honor to use anything else other then 10 Mac OSX > Servers and they all have 10.5.8 Server on them. So any help on installing > Maui on it would be helpful at this time to see if it does fix the problem I > am having. Really sorry to have missed this previously! Unfortunately my best suggestion would be to try the mauiusers list in case there are people there who can help - I've not touched Maui on Apple for many years now and my memory of it has bitrotted and the people who were the sysadmins for that cluster at the time have moved on. Apologies I can't be more useful! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8Y0QgACgkQO2KABBYQAh+MjQCfTtsx2YXNwX6V6DNvNsV38IsW VR4Anj7eJzF9ARYy531CrQMF2FxH/p2M =KRVk -----END PGP SIGNATURE----- From bugzilla-daemon at supercluster.org Thu Jan 19 20:58:52 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Thu, 19 Jan 2012 20:58:52 -0700 (MST) Subject: [torquedev] [Bug 172] New: Option to allow qsub to block until job completes Message-ID: http://www.clusterresources.com/bugzilla/show_bug.cgi?id=172 Summary: Option to allow qsub to block until job completes Product: TORQUE Version: 2.4.x Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P5 Component: clients AssignedTo: knielson at adaptivecomputing.com ReportedBy: chris at csamuel.org CC: torquedev at supercluster.org Estimated Hours: 0.0 Hi there, As per the mailing list thread from 2011, subject "Option to make qsub block until job completes ?" it would be useful to some of our folks who are developing pipelines if we they could specify an option to qsub to block until a job completes. It would be extra shiny if there was some way to convey the exit code of the job encoded in the exit status of qsub (perhaps the highest exit status of qsub + large offset?). -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From rea+maui at grid.kiae.ru Sat Jan 21 13:07:31 2012 From: rea+maui at grid.kiae.ru (Eygene Ryabinkin) Date: Sun, 22 Jan 2012 00:07:31 +0400 Subject: [torquedev] Patch: spread job polling more uniformly Message-ID: Good day. It was a long time since I had posted to this list, but now I have a patch that should help a busy scheduling systems to be more responsive. During the optimization of our Torque/Maui server, I had found that the spread of the running job polls is governed by the queue_rank value: the remainder from its division by JobStatRate is used as the spreading factor. That's not a very good thing, because we can do better by just spreading the polls really uniformly: t_n = now + n * JobStatRate / N_running. The attached patch does this. It is currently being tested on our cluster and shows no regressions for some hours. I'll test it more thorougly, but the code review will also be good. -- Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute" -------------- next part -------------- A non-text attachment was scrubbed... Name: torque-2.5.10-spread-polls-uniformly.patch Type: text/x-diff Size: 3673 bytes Desc: not available Url : http://www.supercluster.org/pipermail/torquedev/attachments/20120122/9bb79821/attachment.bin From Gareth.Williams at csiro.au Mon Jan 23 23:02:50 2012 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Tue, 24 Jan 2012 17:02:50 +1100 Subject: [torquedev] showjobs reverse patch Message-ID: <007DECE986B47F4EABF823C1FBB19C620102C8732F04@exvic-mbx04.nexus.csiro.au> Hi Ken, I started getting annoyed about how long showjobs took to run with many jobs logged. A simple optimization for what I think is the most common case (getting info about a single recent job) is to parse the logs in reverse order (of date) and stop parsing at the first match. The patch below reverses the order if a single job is being queried (though it could be an array job) and fixes/simplifies the sub-setting if only a maximum number of logs is to be considered (-n option). The -o|--oneonly option is needed to actually stop the search after the first match is found - otherwise the match will now usually be early in the search but the full search will be done anyway. It would be nice to eliminate or automatically set the oneonly flag. This would be a bit tricky as I want to allow for the case of showing all members of an array job. Perhaps it would be better to have an option for that case. Cheers, Gareth --- showjobs.mgb 2012-01-19 15:46:36.319334573 +1100 +++ showjobs.rev 2012-01-24 16:44:49.665131378 +1100 @@ -25,7 +25,7 @@ my ( $account, $endDate, $full, $group, $help, $man, $num, $queue, - $specifiedJobId, $startDate, $user + $specifiedJobId, $startDate, $user, $oneonly ); GetOptions( 'account=s' => \$account, @@ -39,6 +39,7 @@ 'num=i' => \$num, 'startDate=s' => \$startDate, 'user=s' => \$user, + 'oneonly' => \$oneonly, ) or pod2usage(2); # Display usage if necessary @@ -83,7 +84,14 @@ or die "Unable to change directory to job_logs directory (${torqueHomeDir}/job_logs): $!\n"; my @jobFiles = glob("20*"); - at jobFiles = sort @jobFiles; +if ($specifiedJobId) +{ + @jobFiles = reverse sort @jobFiles; # search from most recent log if a particular jobs is specified +} +else +{ + @jobFiles = sort @jobFiles; +} if ($startDate) { if ($startDate =~ /^(\d{4})[\/\-](\d{2})[\/\-](\d{2})$/) @@ -110,8 +118,17 @@ } if ($num) { - my $firstIndex = ($num <= @jobFiles) ? @jobFiles - $num : 0; - my $lastIndex = $#jobFiles; + my ($firstIndex, $lastIndex); + if ($specifiedJobId) + { + $firstIndex = ($num <= @jobFiles) ? @jobFiles - $num : 0; + $lastIndex = $#jobFiles; + } + else + { + $firstIndex = 0; + $lastIndex = ($num < $#jobFiles) ? $num : $#jobFiles; + } @jobFiles = @jobFiles[$firstIndex .. $lastIndex]; } @@ -121,7 +138,7 @@ my %jobAttr = (); my @jobs = (); my $context = ''; -foreach my $jobFile (@jobFiles) +OUTER: foreach my $jobFile (@jobFiles) { open JOBS, "< $jobFile" or die "Unable to open job file ($jobFile) for reading: $!\n"; @@ -299,6 +316,7 @@ my %jobAttrCopy = %jobAttr; push @jobs, \%jobAttrCopy; } + last OUTER if @jobs and $oneonly; } } @@ -491,6 +509,10 @@ Show only job records matching the specified user. +=item B<-o | --oneonly> + +Show only the first job record found. This will mostly be much faster and give the same result as if the flag is omitted if the search is for a specific non-array job or specific array job member. + =item B<-? | --help> brief help message From Gareth.Williams at csiro.au Tue Jan 24 02:15:16 2012 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Tue, 24 Jan 2012 20:15:16 +1100 Subject: [torquedev] job logging showjobs vmem patch In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102C8732EEA@exvic-mbx04.nexus.csiro.au> References: <007DECE986B47F4EABF823C1FBB19C620102C8732EDF@exvic-mbx04.nexus.csiro.au> <89c4c94b-9fff-4be9-8fb8-351504733551@mail> <007DECE986B47F4EABF823C1FBB19C620102C8732EEA@exvic-mbx04.nexus.csiro.au> Message-ID: <007DECE986B47F4EABF823C1FBB19C620102C8732F05@exvic-mbx04.nexus.csiro.au> > -----Original Message----- > From: torquedev-bounces at supercluster.org [mailto:torquedev- > bounces at supercluster.org] On Behalf Of Gareth.Williams at csiro.au > Sent: Thursday, 19 January 2012 4:02 PM > To: torquedev at supercluster.org > Subject: [ExternalEmail] Re: [torquedev] job logging showjobs vmem > patch > > > -----Original Message----- > > From: torquedev-bounces at supercluster.org [mailto:torquedev- > > bounces at supercluster.org] On Behalf Of Ken Nielson > > Sent: Tuesday, 17 January 2012 4:42 AM > > To: Torque Developers mailing list > > Subject: Re: [torquedev] job logging showjobs vmem patch > > > > Gareth, > > > > I see what you mean. If you have time and are willing to fix this for > > array jobs we would be very happy. > > > > Ken > > > > ----- Original Message ----- > > > From: "Gareth Williams" > > > To: torquedev at supercluster.org > > > Sent: Sunday, January 15, 2012 9:17:31 PM > > > Subject: Re: [torquedev] job logging showjobs vmem patch > > > > > > I've just noticed that array jobs are not particularly well handled > > > by the showjobs script: > -snip- > > Here's a patch for showjobs to handle array jobs better. There is a > second path to express memory in mebibytes or gibibytes if the values > are big enough (someone local asked for that - and it is meant to be > human readable). I noticed that there is a job record for the whole > array which has some value but it contains Output and Error file fields > which (currently) no such files are created. > > Gareth Hi Ken, I'm sorry to say there was a bug in the first patch - the first bit of new code needed to be in the block above. I'm also worried that my email is mangling text - maybe related to mixed line endings in the file. I'll send a zipped attachment of my current version directly to you for Adaptive to consider. Gareth > > --- /apps/torque/bin/showjobs 2012-01-16 19:37:43.336090321 +1100 > +++ showjobs 2012-01-19 14:42:48.032725413 +1100 > @@ -73,6 +73,11 @@ > } > } > > +# treat brackets as literal in job id for array jobs > +$specifiedJobId =~ s/\[/\\\[/g; > +$specifiedJobId =~ s/\]/\\\]/g; > + > + > # Build a sorted list of job files > chdir("${torqueHomeDir}/job_logs") > or die > @@ -265,7 +270,7 @@ > if ( > defined $specifiedJobId > && (! defined $jobAttr{'Job Id'} > - || $jobAttr{'Job Id'} !~ /^${specifiedJobId}\b/) > + || $jobAttr{'Job Id'} !~ > /^${specifiedJobId}(?:.|\b)/) > ); > next > if ( > > --- showjobs 2012-01-19 15:57:42.517901727 +1100 > +++ showjobs.mgb 2012-01-19 15:46:36.319334573 +1100 > @@ -185,11 +185,11 @@ > } > elsif ($line =~ /([^<]+) 'resources_used') > { > - $jobAttr{'Memory Used'} = $1; > + $jobAttr{'Memory Used'} = ktoMG($1); > } > elsif ($line =~ /([^<]+) 'resources_used') > { > - $jobAttr{'vmem Used'} = $1; > + $jobAttr{'vmem Used'} = ktoMG($1); > } > elsif ($line =~ /([^<]+) 'resource_List') > { > @@ -414,6 +414,31 @@ > else { return sprintf('%02d:%02d:%02d', $hours, $minutes, > $seconds); } > } > > + > +###################################################################### > ########## > +# $ktoMG = ktoMG($kilo) > +# Converts kilo units to mega or giga (mebi and gibi) > +# k, M and G are used rather than the more correct Ki, Mi and Gi > +###################################################################### > ########## > +sub ktoMG > +{ > + my ($value) = @_; > + if ($value =~ /^\s*(\d+)k/) { > + my $k = $1; > + if ($k > 1024) { > + my $M = $k / 1024; > + if ($M > 1024) { > + my $G = $M / 1024; > + $G = sprintf("%.1f", $G); > + $value =~ s/${k}k/${G}G/; > + } else { > + $M = sprintf("%.1f", $M); > + $value =~ s/${k}k/${M}M/; > + } > + } > + } > + return $value; > +} > > ####################################################################### > ####### > > __END__ > _______________________________________________ > torquedev mailing list > torquedev at supercluster.org > http://www.supercluster.org/mailman/listinfo/torquedev From knielson at adaptivecomputing.com Tue Jan 24 09:20:55 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Tue, 24 Jan 2012 09:20:55 -0700 (MST) Subject: [torquedev] job logging showjobs vmem patch In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102C8732F05@exvic-mbx04.nexus.csiro.au> Message-ID: <0e5d1485-a80c-4ff3-9b9c-e44a570d6f00@mail> ----- Original Message ----- > From: "Gareth Williams" > To: torquedev at supercluster.org > Sent: Tuesday, January 24, 2012 2:15:16 AM > Subject: Re: [torquedev] job logging showjobs vmem patch > > > -----Original Message----- > > From: torquedev-bounces at supercluster.org [mailto:torquedev- > > bounces at supercluster.org] On Behalf Of Gareth.Williams at csiro.au > > Sent: Thursday, 19 January 2012 4:02 PM > > To: torquedev at supercluster.org > > Subject: [ExternalEmail] Re: [torquedev] job logging showjobs vmem > > patch > > > > > -----Original Message----- > > > From: torquedev-bounces at supercluster.org [mailto:torquedev- > > > bounces at supercluster.org] On Behalf Of Ken Nielson > > > Sent: Tuesday, 17 January 2012 4:42 AM > > > To: Torque Developers mailing list > > > Subject: Re: [torquedev] job logging showjobs vmem patch > > > > > > Gareth, > > > > > > I see what you mean. If you have time and are willing to fix this > > > for > > > array jobs we would be very happy. > > > > > > Ken > > > > > > ----- Original Message ----- > > > > From: "Gareth Williams" > > > > To: torquedev at supercluster.org > > > > Sent: Sunday, January 15, 2012 9:17:31 PM > > > > Subject: Re: [torquedev] job logging showjobs vmem patch > > > > > > > > I've just noticed that array jobs are not particularly well > > > > handled > > > > by the showjobs script: > > -snip- > > > > Here's a patch for showjobs to handle array jobs better. There is > > a > > second path to express memory in mebibytes or gibibytes if the > > values > > are big enough (someone local asked for that - and it is meant to > > be > > human readable). I noticed that there is a job record for the whole > > array which has some value but it contains Output and Error file > > fields > > which (currently) no such files are created. > > > > Gareth > > Hi Ken, > > I'm sorry to say there was a bug in the first patch - the first bit > of new code needed to be in the block above. I'm also worried that > my email is mangling text - maybe related to mixed line endings in > the file. > > I'll send a zipped attachment of my current version directly to you > for Adaptive to consider. > > Gareth > > > > > --- /apps/torque/bin/showjobs 2012-01-16 19:37:43.336090321 +1100 > > +++ showjobs 2012-01-19 14:42:48.032725413 +1100 > > @@ -73,6 +73,11 @@ > > } > > } > > > > +# treat brackets as literal in job id for array jobs > > +$specifiedJobId =~ s/\[/\\\[/g; > > +$specifiedJobId =~ s/\]/\\\]/g; > > + > > + > > # Build a sorted list of job files > > chdir("${torqueHomeDir}/job_logs") > > or die > > @@ -265,7 +270,7 @@ > > if ( > > defined $specifiedJobId > > && (! defined $jobAttr{'Job Id'} > > - || $jobAttr{'Job Id'} !~ > > /^${specifiedJobId}\b/) > > + || $jobAttr{'Job Id'} !~ > > /^${specifiedJobId}(?:.|\b)/) > > ); > > next > > if ( > > > > --- showjobs 2012-01-19 15:57:42.517901727 +1100 > > +++ showjobs.mgb 2012-01-19 15:46:36.319334573 +1100 > > @@ -185,11 +185,11 @@ > > } > > elsif ($line =~ /([^<]+) > 'resources_used') > > { > > - $jobAttr{'Memory Used'} = $1; > > + $jobAttr{'Memory Used'} = ktoMG($1); > > } > > elsif ($line =~ /([^<]+) > 'resources_used') > > { > > - $jobAttr{'vmem Used'} = $1; > > + $jobAttr{'vmem Used'} = ktoMG($1); > > } > > elsif ($line =~ /([^<]+) > 'resource_List') > > { > > @@ -414,6 +414,31 @@ > > else { return sprintf('%02d:%02d:%02d', $hours, $minutes, > > $seconds); } > > } > > > > + > > +###################################################################### > > ########## > > +# $ktoMG = ktoMG($kilo) > > +# Converts kilo units to mega or giga (mebi and gibi) > > +# k, M and G are used rather than the more correct Ki, Mi and Gi > > +###################################################################### > > ########## > > +sub ktoMG > > +{ > > + my ($value) = @_; > > + if ($value =~ /^\s*(\d+)k/) { > > + my $k = $1; > > + if ($k > 1024) { > > + my $M = $k / 1024; > > + if ($M > 1024) { > > + my $G = $M / 1024; > > + $G = sprintf("%.1f", $G); > > + $value =~ s/${k}k/${G}G/; > > + } else { > > + $M = sprintf("%.1f", $M); > > + $value =~ s/${k}k/${M}M/; > > + } > > + } > > + } > > + return $value; > > +} > > > > ####################################################################### > > ####### > > > > __END__ > > _______________________________________________ > > torquedev mailing list > > torquedev at supercluster.org > > http://www.supercluster.org/mailman/listinfo/torquedev > _______________________________________________ Gareth, I will wait for the zip file to merge your changes. Ken From Gareth.Williams at csiro.au Tue Jan 24 17:33:10 2012 From: Gareth.Williams at csiro.au (Gareth.Williams at csiro.au) Date: Wed, 25 Jan 2012 11:33:10 +1100 Subject: [torquedev] job logging and keep_completed Message-ID: <007DECE986B47F4EABF823C1FBB19C620102C8732F0D@exvic-mbx04.nexus.csiro.au> Hi Ken, and David, I've observed that job log entries do not show up until after jobs leave the 'C' state after the keep_completed period. Would it be sensible and easy to add the log entry as the job enters the 'C' state? Gareth -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120125/4385d8c3/attachment.html From knielson at adaptivecomputing.com Wed Jan 25 05:14:05 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Wed, 25 Jan 2012 05:14:05 -0700 (MST) Subject: [torquedev] job logging and keep_completed In-Reply-To: <007DECE986B47F4EABF823C1FBB19C620102C8732F0D@exvic-mbx04.nexus.csiro.au> Message-ID: <5598dbad-4fb5-4d9d-b41a-d640f8b1a7c0@mail> ----- Original Message ----- > From: "Gareth Williams" > To: torquedev at supercluster.org > Sent: Tuesday, January 24, 2012 5:33:10 PM > Subject: [torquedev] job logging and keep_completed > > > > > > Hi Ken, and David, > > > > I?ve observed that job log entries do not show up until after jobs > leave the ?C? state after the keep_completed period. Would it be > sensible and easy to add the log entry as the job enters the ?C? > state? > > > > Gareth Gareth, I don't remember why I waited until after the job was purged to write the job to file but it seems reasonable that you should be able to write the job to file as it enters the completed state. Ken From rea+maui at grid.kiae.ru Wed Jan 25 23:40:04 2012 From: rea+maui at grid.kiae.ru (Eygene Ryabinkin) Date: Thu, 26 Jan 2012 10:40:04 +0400 Subject: [torquedev] Patch: spread job polling more uniformly In-Reply-To: <4F200A61.4070303@cyf-kr.edu.pl> References: <4F200A61.4070303@cyf-kr.edu.pl> Message-ID: <9GEaZnlj5QCyXmy1eHRJY6tQnsY@HbohoBmewgxm0atwUoKO7zhAAgw> Lukasz, good day. Wed, Jan 25, 2012 at 02:57:53PM +0100, Lukasz Flis wrote: > How are the tests going on your cluster? Well, they're good: our Torque-Maui tandem now responds even more quickly and the percentage for periods of slow responses from Torque server dropped by approx 7 times. But our worker nodes are heavily loaded with I/O-intensive tasks, so we see this improvement. For more responsive pbs_moms the problem I am fighting with shouldn't come up. > I would like to give it a try 12k core cluser here in Cyfronet but i > would like to be sure that you observed no regressions since then. No regressions with something like 10k+ jobs per day. > Have you measured the improvements over original code? Yes, as I said periods of irresponsiveness dropped by 7 times at our workload. I wonder if people from SuperCluster can review my patch. -- Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute" From samuel at unimelb.edu.au Thu Jan 26 16:39:15 2012 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 27 Jan 2012 10:39:15 +1100 Subject: [torquedev] Patch: spread job polling more uniformly In-Reply-To: <9GEaZnlj5QCyXmy1eHRJY6tQnsY@HbohoBmewgxm0atwUoKO7zhAAgw> References: <4F200A61.4070303@cyf-kr.edu.pl> <9GEaZnlj5QCyXmy1eHRJY6tQnsY@HbohoBmewgxm0atwUoKO7zhAAgw> Message-ID: <4F21E423.70507@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 26/01/12 17:40, Eygene Ryabinkin wrote: > I wonder if people from SuperCluster can review my patch. Might be worth reporting a bug against this issue (if you've not already done so): http://www.clusterresources.com/bugzilla/ cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8h5CMACgkQO2KABBYQAh9uuwCfW4FjmiAzHWS5EqGHxV+cDUIH LKEAn1t0PS0tLmbv7xctRUQDL0mQwBs3 =he/K -----END PGP SIGNATURE----- From knielson at adaptivecomputing.com Thu Jan 26 17:10:57 2012 From: knielson at adaptivecomputing.com (Ken Nielson) Date: Thu, 26 Jan 2012 17:10:57 -0700 (MST) Subject: [torquedev] Patch: spread job polling more uniformly In-Reply-To: Message-ID: ----- Original Message ----- > From: "Eygene Ryabinkin" > To: torquedev at supercluster.org > Sent: Saturday, January 21, 2012 1:07:31 PM > Subject: [torquedev] Patch: spread job polling more uniformly > > Good day. > > It was a long time since I had posted to this list, but now I have > a patch that should help a busy scheduling systems to be more > responsive. During the optimization of our Torque/Maui server, > I had found that the spread of the running job polls is governed > by the queue_rank value: the remainder from its division by > JobStatRate is used as the spreading factor. > > That's not a very good thing, because we can do better by just > spreading the polls really uniformly: > t_n = now + n * JobStatRate / N_running. > > The attached patch does this. It is currently being tested on > our cluster and shows no regressions for some hours. I'll test > it more thorougly, but the code review will also be good. > -- > Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute" > Eygene, Thanks for the patch. I will merge it and test it. What version of TORQUE was this made against? Ken From rea+maui at grid.kiae.ru Thu Jan 26 23:22:28 2012 From: rea+maui at grid.kiae.ru (Eygene Ryabinkin) Date: Fri, 27 Jan 2012 10:22:28 +0400 Subject: [torquedev] Patch: spread job polling more uniformly In-Reply-To: References: Message-ID: Ken, good day. Thu, Jan 26, 2012 at 05:10:57PM -0700, Ken Nielson wrote: > Thanks for the patch. I will merge it and test it. What version of > TORQUE was this made against? It's for 2.5.10. Had just checked: it applies to the virgin 2.5.10 tree cleanly (I have other patches for our production Torque that should be a bit polished to be sent to the wider audience, but this one commutees with all of them and doesn't depend on any functionality that these patches add). Thanks! -- Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute" From bugzilla-daemon at supercluster.org Thu Jan 26 23:36:20 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Thu, 26 Jan 2012 23:36:20 -0700 (MST) Subject: [torquedev] [Bug 139] Negative value in 'Que' when using qstat In-Reply-To: References: Message-ID: <20120127063620.F1EEF77801C@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=139 Nicolas Pinto changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nicolas.pinto at gmail.com --- Comment #1 from Nicolas Pinto 2012-01-26 23:36:20 MST --- I'm getting the same issue with torque version 3.0.3. Paul, which version are you using? -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Thu Jan 26 23:47:47 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Thu, 26 Jan 2012 23:47:47 -0700 (MST) Subject: [torquedev] [Bug 139] Negative value in 'Que' when using qstat In-Reply-To: References: Message-ID: <20120127064747.167AB157801B@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=139 --- Comment #2 from Nicolas Pinto 2012-01-26 23:47:46 MST --- Quick follow up. I upgraded to 3.0.4 and "qstat -q" now gives the correct output: However, the problem persists with "qmgr -c 'list server' | grep state_count": state_count = Transit:0 Queued:-48 Held:100 Waiting:0 Running:48 Exiting:0 -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Thu Jan 26 23:52:13 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Thu, 26 Jan 2012 23:52:13 -0700 (MST) Subject: [torquedev] [Bug 139] Negative value in 'Que' when using qstat In-Reply-To: References: Message-ID: <20120127065213.BC67C1578020@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=139 --- Comment #3 from Nicolas Pinto 2012-01-26 23:52:13 MST --- After a second look, even "qstat -q" returns the wrong output: $ qstat -q Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- kraken.q -- -- -- -- 47 -1 -- E R ----- ----- 47 -1 $ qmgr -c 'list server' | grep state_count state_count = Transit:0 Queued:-101 Held:100 Waiting:0 Running:47 Exiting:0 -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Mon Jan 30 23:24:04 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Mon, 30 Jan 2012 23:24:04 -0700 (MST) Subject: [torquedev] [Bug 168] 2.5(.9) qsub does not seem to accept comma seperated -W argument In-Reply-To: References: Message-ID: <20120131062404.D74D34122D40@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=168 --- Comment #2 from Eygene Ryabinkin 2012-01-30 23:24:04 MST --- Created an attachment (id=101) --> (http://www.clusterresources.com/bugzilla/attachment.cgi?id=101) Fixes handling of multiple comma-separated arguments for -W I have the same issue for 2.5.10. The attached patch fixes the problem for me. Please, try it (though I am not sure that this patch will work for the whole 2.5 line). -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 04:35:07 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 04:35:07 -0700 (MST) Subject: [torquedev] [Bug 168] 2.5(.9) qsub does not seem to accept comma seperated -W argument In-Reply-To: References: Message-ID: <20120131113507.6A7EC4122F2D@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=168 --- Comment #3 from Eygene Ryabinkin 2012-01-31 04:35:07 MST --- Created an attachment (id=102) --> (http://www.clusterresources.com/bugzilla/attachment.cgi?id=102) Fixes old regression: qsub -W won't accept directive like stagein=a at h:b,c at h:d And another patch that fixes the long-standing regression in the qsub behaviour: as the manual says, -W stagein has the form 'stagein=file_list' with file_list being "local_file at hostname:remote_file[,...]", but qsub from 2.5.10 won't allow such syntax. Since Torque is widely used by the European Grid middleware (http://repository.egi.eu/) and it used to rely on the old behaviour of -W, it will be good to restore that (though, just now it uses a workaround that sticks -W stagein=a at h:b,stagein=c at h:d' to the submission scripts). And the previous patch that allows for '-W stagein=a at h:b,stagein=c at h:d' is an absolute must: without it nothing will be submitted to the batch system by the EGI CREAM middleware. -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 04:36:20 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 04:36:20 -0700 (MST) Subject: [torquedev] [Bug 168] 2.5(.9) qsub does not seem to accept comma seperated -W argument In-Reply-To: References: Message-ID: <20120131113620.2F0CD4122F2F@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=168 Eygene Ryabinkin changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rea+maui at grid.kiae.ru --- Comment #4 from Eygene Ryabinkin 2012-01-31 04:36:19 MST --- Forgot to say: these patches are currently deployed at our production infrastructure and are OK already for some hours ;)) -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 08:09:46 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 08:09:46 -0700 (MST) Subject: [torquedev] [Bug 168] 2.5(.9) qsub does not seem to accept comma seperated -W argument In-Reply-To: References: Message-ID: <20120131150946.C393F678034@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=168 --- Comment #5 from Eygene Ryabinkin 2012-01-31 08:09:46 MST --- Found out that processing of double and single quotes and backslashes from smart_strtok was dropped in my patch. Will try to resurrect them and prepare the updated fix. -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 20:24:35 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 20:24:35 -0700 (MST) Subject: [torquedev] [Bug 173] New: [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX] Message-ID: http://www.clusterresources.com/bugzilla/show_bug.cgi?id=173 Summary: [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX] Product: TORQUE Version: 3.0.x Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P5 Component: pbs_mom AssignedTo: knielson at adaptivecomputing.com ReportedBy: nicolas.pinto at gmail.com CC: torquedev at supercluster.org Estimated Hours: 0.0 There is a buffer overflow in pbs_mom when using --enable-nvidia-gpus and a large number of GPUs (e.g. 8): Report: ------- # with $loglevel 7 $ tail /var/log/messages Jan 31 22:04:56 munctional6 pbs_mom: LOG_DEBUG::check_nvidia_version_file, Nvidia driver info: NVRM version: NVIDIA UNIX x86_64 Kernel Module 290.10 Wed Nov 16 17:39:29 PST 2011 Jan 31 22:04:56 munctional6 pbs_mom: LOG_DEBUG::gpus, gpus: GPU cmd issued: nvidia-smi -q -x 2>&1 Jan 31 22:04:59 munctional6 kernel: [ 4186.963718] pbs_mom[7497] general protection ip:41dd69 sp:7fff2ccf72b8 error:0 in pbs_mom[400000+56000] Cause: ------ src/resmom/mom_server.c:2507 only allocates 16 KB whereas the output of "nvidia-smi -q -x 2>&1" is ~24KB Bug fix: -------- The following simple patch fixes the issue: --- src/resmom/mom_server.c.orig 2012-01-12 17:18:49.000000000 -0500 +++ src/resmom/mom_server.c 2012-01-31 22:24:01.179534519 -0500 @@ -2504,7 +2504,7 @@ static char id[] = "generate_server_gpustatus_smi"; char *dataptr, *outptr, *tmpptr1, *tmpptr2, *savptr; - char gpu_string[16 * 1024]; + char gpu_string[32 * 1024]; int gpu_modes[32]; int have_modes = FALSE; int gpuid = -1; HTH Nicolas -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 20:32:33 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 20:32:33 -0700 (MST) Subject: [torquedev] [Bug 173] [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX] In-Reply-To: References: Message-ID: <20120201033233.78B2B6780F8@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=173 Nicolas Pinto changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|enhancement |critical -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 20:59:30 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 20:59:30 -0700 (MST) Subject: [torquedev] [Bug 173] [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX] In-Reply-To: References: Message-ID: <20120201035930.5E8B26780FC@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=173 Eygene Ryabinkin changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rea+maui at grid.kiae.ru --- Comment #1 from Eygene Ryabinkin 2012-01-31 20:59:30 MST --- Not that easy: in reality, gpus() must be fixed to respect the passed buffer_size. Or, better, it should do memory allocation/reallocation by itself and return the dynamic buffer to the caller to avoid problems with incompletely captured output from NVidia SMI tools. -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. From bugzilla-daemon at supercluster.org Tue Jan 31 21:35:50 2012 From: bugzilla-daemon at supercluster.org (bugzilla-daemon at supercluster.org) Date: Tue, 31 Jan 2012 21:35:50 -0700 (MST) Subject: [torquedev] [Bug 173] [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX] In-Reply-To: References: Message-ID: <20120201043550.899FA6780FC@http.supercluster.org> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=173 --- Comment #2 from Nicolas Pinto 2012-01-31 21:35:50 MST --- Agreed. What you suggest would be the correct way to handle this. I just hacked something in a way that is "compatible" with the current implementation. -- Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.