[torqueusers] TORQUE 4.0 Officially Announced
dbeer at adaptivecomputing.com
Tue Mar 13 13:42:36 MDT 2012
TORQUE 4.0 is officially here! Please check out Adaptive Computing's
official announcement here:
The tarball can be downloaded from here:
We have several sites currently using 4.0 and feedback has been positive.
These warnings are posted on the download site, but I am copying them here:
1. Make sure that you have openssl-devel (RedHat based) / libssl-dev
(Debian based) installed (the name may differ for different operating
systems) in order to be able to build TORQUE 4.0.
2. Make sure that you run the daemon trqauthd on machines that will be
running client commands. NOTE: there is an init.d script for it in
contrib/init.d/ but it needs customization (this includes Moab). One
problem is that it has a misspelling for PBS_DAEMON - it should be
/usr/local/sbin/trqauthd by default, not /usr/local/bin/trqauthd.
3. Moab needs to be started or restarted after installing TORQUE 4.0 (if
you are using Moab)
Please make sure to take all normal precautions for upgrading. Another
advisory (not on the website) is that TORQUE now uses hwloc to manage
cpusets, meaning you will need to install hwloc on your system if it isn't
already there and you wish to use it. It needs to be version 1.1 or higher.
The major features of the release are briefly described on the release, but
the CHANGELOG for 4.0 is copied at the end of this email.
This release has undergone more testing than any previous release of
TORQUE; to be fair, it also has more changes than any previous version of
TORQUE. Overall, we saw very good results in our beta program and most of
the sites using it have had good experiences. We are proud of the quality
of this release and hope that you'll try it out and let us know how it
works for you.
David Beer | Software Engineer
e - make a threadpool for TORQUE server. The number of threads is
customizable using min_threads and max_threads, and idle time before
exiting can be set using thread_idle_seconds.
e - make pbs_server multi-threaded in order to increase responsiveness
e - remove the forking from pbs_server running a job, the thread handling
the request just
waits until the job is run.
e - change qdel to simply send qdel all - previously this was executed by
a qstat and a qdel
of every individual job
e - no longer fork to send mail, just use a thread
e - use hwloc as the backbone for cpuset support in TORQUE (contributed
by Dr. Bernd Kallies)
e - add the boolean variable $use_smt to mom config. If set to false,
this skips logical
cores and uses only physical cores for the job. It is true by default.
(contributed by Dr. Bernd Kallies)
n - with the multi-threading the pbs_server -t create and -t cold
commands could no longer
ask for user input from the command line. The call to ask if the user
wants to continue
was moved higher in the initialization process and some of the
wording changed to
reflect what is now happening.
e - if cpusets are configured but aren't found and cannot be mounted,
pbs_mom will now fail to
start instead of failing silently.
e - Change node_spec from an N^2 (but average 5N) algorithm to an N
algorithm with respect
to nodes. We only loop over each node once at a maximum.
e - Abandon pbs_iff in favor of trqauthd. trqauthd is a daemon to be
started once that can
perform pbs_iff's functionality, increasing speed and enabling future
e - add mom_hierarchy functionality for reporting. The file is located in
<TORQUE_HOME>/server_priv/mom_hierarchy, and can be written to tell
moms to send
updates to other moms who will pass them on to pbs_server. See docs
e - add a unit testing framework (check). It is compiled with
--with-check and tests
are executed using make check. The framework is complete but not many
been written as of yet.
e - Mom rejection messages are now passed back to qrun when possible
e - Added the option -c for startup. By default, the server attempts to
send the mom
hierarchy file to all moms on startup, and all moms update the server
the hierarchy file. If both are trying to do this at once, it can
cause a lot of
traffic. -c tells pbs_server to wait 10 minutes to attempt to contact
haven't contacted it, reducing this traffic.
e - Added mom parameter -w to reduce start times. This parameter wait to
first update until the server sends it the mom hierarchy file, or
minutes have passed. This should reduce large cluster startup times.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers