[torquedev] TORQUE 4.0 Is Officially Beta-Testing

David Beer dbeer at adaptivecomputing.com
Thu Dec 22 20:01:13 MST 2011


We are happy to announce that TORQUE 4.0 is officially transitioning to a beta stage of development. We would like to encourage all to download and and install it on your test systems. Please remember that this software is in a beta stage, and that enormous changes have been made to improve TORQUE. Even though this beta has gone through a greatly improved QA process, we expect to see some hiccups as people begin to roll it out on their test systems.

You can download it here: http://www.adaptivecomputing.com/resources/downloads/torque/4.0-beta/torque-4.0.0-snap.26656snapstamp.tar.gz
Documentation is located here for html: http://www.adaptivecomputing.com/resources/docs/torque/4-0/help.htm and here for pdf: PDF:  http://www.adaptivecomputing.com/resources/docs/torque/4-0/torqueAdminGuide-4.0.pdf

We are hoping that the TORQUE community will be willing to assist us in making TORQUE 4.0 the best release of TORQUE to date. In order to effectively assist in this process, administrators would need to:

1. Configure with debugging symbols (--with-debug) 
2. Ensure that core dumping is on for all server daemons (execute ulimit -c unlimited as the 
user that will run pbs_server or pbs_mom).
3. Be proactive in gathering as much information about any problems as possible, and share that information with the developers.

Access to the TORQUE 4.0 Beta is currently limited to the TORQUE community and support is expected to be largely community-driven until early January 2012.  If you are interested in participating in our official beta program kicking off in January 2012, please email David Gardner within Product Management (dgardner at adaptivecomputing.com).

Participants in our invite-only beta program will receive increased support from Adaptive Computing engineering through February 2012. Space is limited, so let us know soon if you want to be considered.  Your feedback is extremely important to us and we look forward to hearing back from you either through the users list and or as participants in the official Adaptive Computing beta program.

There are some known issues with the beta:

1. qstat will occasionally, when a system is under a high load, crash. To workaround, run qstat again. (It doesn't crash the server).
2. Sometimes, when running parallel jobs rapidly, they get stuck in a running state. We have observed this very rarely.
3. autogen.sh is currently broken on older versions of aclocal and autoconf.
4. There is currently no error checking for inconsistencies in the nodes file and the mom_hierarchy file
5. The server will segfault under very high load (servicing hundreds of requests per second for several minutes in a row while manually qrun'ing jobs).
6. qdel -p doesn't put running jobs in a completed state, it just deletes them.

Feel free to update us on known issues as they become known.

David Beer 
Direct Line: 801-717-3386 | Fax: 801-717-3738
     Adaptive Computing
     1712 S East Bay Blvd, Suite 300
     Provo, UT 84606

More information about the torquedev mailing list