TORQUE Resource Manager

TORQUE Administrator's Manual - Change Log

TORQUE Resource Manager Change Log

TORQUE 2.3

2.3.4
  • b - fixed a bug with RPM spec files due to new pbs_track executable
  • b - fixed a bug with "max_report" where jobs not in the Q state were not always being reported to scheduler
  • b - fixed bug with new UNIX socket communication when more than one TORQUE instance is running on the same host
  • c - fixed a few memory errors due to a spurious comma and some uninitialized memory being allocated
  • b - fixed a bug preventing multiple TORQUE servers and TORQUE MOMs from operating properly all from the same host
  • f - enabled 'qsub -T' to specify "job type." Currently this will allow a per job prolog/epilog
  • f - added a new '-E' option to qstat which allows command-line users to pass "extend" strings via the API
  • f - added new max_report queue attribute which will limit the number of Idle jobs, per queue, that TORQUE reports to the scheduler
  • e - enhanced logging when a hostname cannot be looked up in DNS
  • e - PBS_NET_MAX_CONNECTIONS can now be defined at compile time (via CFLAGS)
  • e - modified source code so that all .c and .h files now conform more closely to the new CRI format style
  • c - fixed segfault when loading job files of an older/incompatible version
  • b - fixed a bug where if attempt to send job to a pbs_mom failed due to timeout, the job would indefinitely remain the in 'R' state
  • b - fixed a bug where CPU time was not being added up properly in all cases (fix for Linux only)
  • e - pbs_track now allows passing of - and -- options to the a.out argument
  • b - qsub now properly interprets -W umask=0XXX as octal umask
  • e - allow $HOME to be specified for path
  • e - added --disable-qsub-keep-override to allow the qsub -k flag to not override -o -e.
  • e - updated with security patches for setuid, setgid, setgroups
  • b - fixed correct_ct() in svr_jobfunc.c so we don't crash if we hit COMPLETED job
  • b - fixed problem where momctl -d 0 showed ConfigVersion twice
  • e - if a .JB file gets upgraded pbs_server will back up the original
  • b - removed qhold / qrls -h n option since there is no code to support it
  • b - set job state and substate correctly when job has a hold attribute and is being rerun
  • e - fixed several compiler error and warnings for AIX 5.2 systems

2.3.3
  • b - fixed bug where pbs_mom would sometimes not connect properly with pbs_server after network failures
  • b - changed so run_pelog opens correct stdout/stderr when join is used
  • b - corrected pbs_server man page for SIGUSR1 and SIGUSR2
  • f - added new pbs_track command which may be used to launch an external process and a pbs_mom will then track the resource usage of that process and attach it to a specified job (experimental) (special thanks to David Singleton and David Houlder from APAC)
  • e - added alternate method for sending cluster addresses to mom (ALT_CLSTR_ADDR)

2.3.2
  • e - added --disable-posixmemlock to force mom not to use POSIX MEMLOCK.
  • b - fix potential buffer overrun in qsub
  • b - keep pbs_mom, pbs_server, pbs_sched from closing sockets opened by nss_ldap (SGI)
  • e - added PBS_VERSION environment variable
  • e - added --enable-acct-x to allow adding of x attributes to accounting log
  • b - fix net_server.h build error
  • b - fixed code that was causing jobs to fail due to "neednodes" errors when Moab/Maui was the scheduler

2.3.1
  • b - fixed a bug where torque would fail to start if there was no LF in nodes file
  • b - fixed a bug where TORQUE would ignore the "pbs_asyrunjob" API extension string when starting jobs in asynchronous mode
  • b - fixed memory leak in free_br for PBS_BATCH_MvJobFile case
  • e - torque can now compile on Linux and OS X with NDEBUG defined
  • f - when using qsub it is now possible to specify both -k and -o/-e (before -o/-e did not behave as expected if -k was also used)
  • e - changed pbs_server to have "-l" option. Specifies a host/port that event messages will be sent to. Event messages are the same as what the scheduler currently receives.
  • e - added --enable-autorun to allow qsub jobs to automatically try to run if there are any nodes available.
  • e - added --enable-quickcommit to allow qsub to combine the ready to commit and commit phases into 1 network transmission.
  • e - added --enable-nochildsignal to allow pbs_server to use inline checking for SIGCHLD instead of using the signal handler.
  • e - change qsub so '-v var=' will look in environment for value. If value is not found set it to "".
  • b - fixed mom_server code's HELLO initiation retry control to reduce occurrence of pbs_server incorrectly marking node as unknown/down
  • b - fix qdel of entire job arrays for non operator/managers
  • b - fix so we continue to process exiting jobs for other servers
  • e - added source_login_batch and source_login_interactive to mom config. This allows us to bypass the sourcing of /etc/profile, etc. type files.
  • b - fixed pbs_server segmentation fault when job_array submissions are rejected before ji_arraystruct was initialized
  • e - add some casts to fix some compiler warnings with gcc-4.1 on i386 when -D_FILE_OFFSET_BITS=64 is set
  • e - added --enable-maxnotdefault to allow not using resources_max as defaults.
  • b - fixed file descriptor leak with Linux cpusets (VPAC)
  • b - added new values to TJobAttr so we don't have mismatch with job.h values. Added some comments also.
  • b - reset ji_momhandle so we cannot have more than one pjob for obit_reply to find.
  • e - change qdel to accept 'ALL' as well as 'all'
  • b - changed order of searching so we find most recent jobs first. Prevents finding old leftover job when pids rollover. Also some CACHEOBITFAILURES updates.
  • b - handle case where mom replies with an unknown job error to a stat request from the server
  • b - allow qalter to modify HELD jobs if BLCR is not enabled
  • b - change to update errpath/outpath attributes when -e -o are used with qsub
  • e - added string output for errnos, etc.

2.3.0
  • b - fixed a bug where TORQUE would ignore the "pbs_asyrunjob" API extension string when starting jobs in asynchronous mode
  • e - redesign how torque.spec is built
  • e - added -a to qrun to allow asynchronous job start
  • e - allow qrerun on completed jobs
  • e - allow qdel to delete all jobs
  • e - make qdel -m functionality match the documentation
  • b - prevent runaway hellos being sent to server when mom's node is removed from the server's node list
  • e - local client connections use a unix domain socket, bypassing inet and pbs_iff
  • f - Linux 2.6 cpuset support (in development)
  • e - new job array submission syntax
  • b - fixed SIGUSR1 / SIGUSR2 to correctly change the log level
  • f - health check script can now be run at job start and end
  • e - tm tasks are now stored in a single .TK file rather than eat lots of inodes
  • f - new "extra_resc" server attribute
  • b - "pbs_version" attr is now correctly read-only
  • e - increase max size of .JB and .SC file names
  • e - new "sched_version" server attribute
  • f - new printserverdb tool
  • e - pbs_server/pbs_mom hostname arg is now -H, -h is help
  • e - added $umask to pbs_mom config, used for generated output files.
  • e - minor pbsnodes overhaul
  • b - fixed memory leak in pbs_server

TORQUE 2.2

2.2.0
  • e - improve RPP logging for corruption issues
  • f - dynamic resources
  • b - correct run-time symbol in pam module on RHEL4
  • f - allow manager to set "next job number" vi hidden qmgr attribute next_job_number
  • b - some minor hpux11 build fixes (PACCAR)
  • e - allow pam_pbssimpleauth to be built on OSX and Solaris
  • b - fix bug with log roll and automatic log filenames
  • e - use mlockall() in pbs_mom if _POSIX_MEMLOCK
  • f - consumable resource "tokens" support (Harte-Hanks)
  • b - networking fixes for HPUX, fixes pbs_iff (PACCAR)
  • e - fix "list_head" symbol clash on Solaris 10
  • f - Linux 2.6 cpuset support
  • b - compile error with size_fs() on digitalunix
  • e - build process sets default submit filter path to ${libexecdir}/qsub_filter
    • - we fall back to /usr/local/sbin/torque_submitfilter to maintain compatibility
  • e - allow long job names when not using -N
  • e - pbs_server will now print build details with --about

TORQUE 2.1

2.1.2
  • b - fix momctl queries with multiple hosts
  • b - don't fail make install if --without-sched
  • b - correct MOM compile error with atol()
  • f - qsub will now retry connecting to pbs_server (see manpage)
  • f - X11 forwarding for single-node, interactive jobs with qsub -X
  • f - new pam_pbssimpleauth PAM module, requires --with-pam=DIR
  • e - add logging for node state adjustment
  • f - correctly track node state and allocation based for suspended jobs
  • e - entries can always be deleted from manager ACL, even if ACL contains host(s) that no longer exist
  • e - more informative error message when modifying manager ACL
  • f - all queue create, set, and unset operations now set a queue mtime
  • f - added support for log rolling to libtorque
  • f - pbs_server and pbs_mom have two new attributes log_file_max_size, log_file_roll_depth
  • e - support installing client libs and cmds on unsupported OSes (like cygwin)
  • b - fix subnode allocation with pbs_sched
  • b - fix node allocation with suspend-resume
  • b - fix stale job-exclusive state when restarting pbs_server
  • b - don't fall over when duplicate subnodes are assigned after suspend-resume
  • b - handle suspended jobs correctly when restarting pbs_server
  • b - allow long host lists in runjob request
  • b - fix truncated XML output in qstat and pbsnodes
  • b - typo broke compile on irix6array and unicos8
  • e - momctl now skips down nodes when selecting by property
  • f - added submit_args job attribute

2.1.1
  • c - fix mom_sync_job code that crashes pbs_server (USC)
  • b - checking disk space in $PBS_SERVER_HOME was mistakenly disabled (USC)
  • e - node's np now accessible in qmgr (USC)
  • f - add ":ALL" as a special node selection when stat'ing nodes (USC)
  • f - momctl can now use :property node selection (USC)
  • f - send cluster addrs to all nodes when a node is created in qmgr (USC)
    • - new nodes are marked offline
    • - all nodes get new cluster ipaddr list
    • - new nodes are cleared of offline bit
  • f - set a node's np from the status' ncpus (only if ncpus > np) (USC)
    • - controlled by new server attribute "auto_node_np"
  • c - fix possible pbs_server crash when nodes are deleted in qmgr (USC)
  • e - avoid dup streams with nodes for quicker pbs_server startup (USC)
  • b - configure program prefix/suffix will now work correctly (USC)
  • b - handle shared libs in tpackages (USC)
  • f - qstat's -1 option can now be used with -f for easier parsing (USC)
  • b - fix broken TM on OSX (USC)
  • f - add "version" and "configversion" RM requests (USC)
  • b - in pbs-config --libs, don't print rpath if libdir is in the sys dlsearch path (USC)
  • e - don't reject job submits if nodes are temporarily down (USC)
  • e - if MOM can't resolve $pbsserver at startup, try again later (USC)
    • - $pbsclient still suffers this problem
  • c - fix nd_addrs usage in bad_node_warning() after deleting nodes (MSIC)
  • b - enable build of xpbsmom on darwin systems (JAX)
  • e - run-time config of MOM's rcp cmd (see pbs_mom(8)) (USC)
  • e - momctl can now accept query strings with spaces, multiple -q opts (USC)
  • b - fix linking order for single-pass linkers like IRIX (ncifcrf)
  • b - fix mom compile on solaris with statfs (USC)
  • b - memory corruption on job exit causing cpu0 to be allocated more than once (USC)
  • e - add increased verbosity to tracejob and added '-q' commandline option
  • e - support larger values in qstat output (might break scripts!) (USC)
  • e - make qterm server shutdown faster (USC)

2.1.0p0
  • fixed job tracking with SMP job suspend/resume (MSIC)
  • modify pbs_mom to enforce memory limits for serial jobs (GaTech)
  • - linux only
  • enable 'never' qmgr maildomain value to disable user mail
  • enable qsub reporting of job rejection reason
  • add suspend/resume diagnostics and logging
  • prevent stale job handler from destroying suspended jobs
  • prevent rapid hello from MOM from doing DOS on pbs_server
  • add diagnostics for why node not considered available
  • add caching of local serverhost addr lookup
  • enable job centric vs queue centric queue limit parameter
  • brand new autoconf+automake+libtool build system (USC)
  • automatic MOM restarts for easier upgrades (USC)
  • new server attributes: acl_group_sloppy, acl_logic_or, keep_completed, kill_delay
  • new server attributes: server_name, allow_node_submit, submit_hosts
  • torque.cfg no longer used by pbs_server
  • pbsdsh and TM enhancements (USC)
  • - tm_spawn() returns an error if execution fails
  • - capture TM stdout with -o
  • - run on unique nodes with -u
  • - run on a given hostname with -h
  • largefile support in staging code and when removing $TMPDIR (USC)
  • use bindresvport() instead of looping over calls to bind() (USC)
  • fix qsub "out of memory" for large resource requests (SANDIA)
  • pbsnodes default arg is now '-a' (USC)
  • new ":property" node selection when node stat and manager set (pbsnodes) (USC)
  • fix race with new jobs reporting wrong walltime (USC)
  • sister moms weren't setting job state to "running" (USC)
  • don't reject jobs if requested nodes is too large node_pack=T (USC)
  • add epilogue.parallel and epilogue.user.parallel (SARA)
  • add $PBS_NODENUM, $PBS_MSHOST, and $PBS_NODEFILE to pelogs (USC)
  • add more flexible --with-rcp='scp|rcp|mom_rcp' instead of --with-scp (USC)
  • build/install a single libtorque.so (USC)
  • nodes are no longer checked against server host acl list (USC)
  • Tcl's buildindex now supports a 3rd arg for "destdir" to aid fakeroot installs (USC)
  • fixed dynamic node destroy qmgr option
  • install rm.h (USC)
  • printjob now prints saved TM info (USC)
  • make MOM restarts with running jobs more reliable (USC)
  • fix return check in pbs_rescquery fixing segfault in pbs_sched (USC)
  • add README.pbstools to contrib directory
  • workaround buggy recvfrom() in Tru64 (USC)
  • attempt to handle socklen_t portably (USC)
  • fix infinite loop in is_stat_get() triggered by network congestion (USC)
  • job suspend/resume enhancements (see qsig manpage) (USC)
  • support higher file descriptors in TM by using poll() instead of select() (USC)
  • immediate job delete feedback to interactive queued jobs (USC)
  • move qmgr manpage from section 8 to section 1
  • add SuSE initscripts to contrib/init.d/
  • fix ctrl-c race while starting interactive jobs (USC)
  • fix memory corruption when tm_spawn() is interrupted (USC)

TORQUE 2.0

2.0.0p6
  • fix segfault in new "acl_group_sloppy" code if a group doesn't exist (USC)
  • configure defaults changed to enable syslog, enable docs, and disable filesync (USC)
  • pelog now correctly restores previous alarm handler (Sandia)
  • misc fixes with syscalls returns, sign-mismatches, and mem corruption (USC)
  • prevent MOM from killing herself on new job race condition - linux only (USC)
  • remove job delete nanny earlier to not interrupt long stageouts (USC)
  • display C state later when using keep_completed (USC)
  • add 'printtracking' command in src/tools (USC)
  • stop overriding the user with name resolution on qsub's -o/-e args (USC)

2.0.0p5
  • reorganize ji_newt structure to eliminate 64 bit data packing issues
  • enable '--disable-spool' configure directive
  • enable stdout/stderr stageout to search through $HOME and $HOME/.pbs_spool
  • fixes to qsub's env handling for newlines and commas (UMU)
  • fixes to at_arst encoding and decoding for newlines and commas (USC)
  • use -p with rcp/scp (USC)
  • several fixes around .pbs_spool usage (USC)
  • don't create "kept" stdout/err files ugo+rw (avoid insane umask) (USC)
  • qsub -V shouldn't clobber qsub's environ (USC)
  • don't prevent connects to "down" nodes that are still talking (USC)
  • allow file globs to work correctly under --enable-wordexp (USC)
  • enable secondary group checking when evaluating queue acl_group attribute
  • - enable the new queue parameter "acl_group_sloppy"
  • sol10 build system fixes (USC)
  • fixed node manager buffer overflow (UMU)
  • fix "pbs_version" server attribute (USC)
  • torque.spec updates (USC)
  • remove the leading space on the node session attribute on darwin (USC)
  • prevent SEGV if config file is missing/corrupt
  • "keep_completed" execution queue attribute
  • several misc code fixes (UMU)

2.0.0p4
  • fix up socklen_t issues
  • fixed epilog to report total job resource utilization
  • improved RPM spec (USC)
  • modified qterm to drop hung connections to bad nodes
  • enhance HPUX operation

2.0.0p3
  • fixed dynamic gres loading in pbs_mom (CRI)
  • added torque.spec (rpmbuild -tb should work) (USC)
  • new 'packages' make target (see INSTALL) (USC)
  • added '-1' qstat option to display node info (UMICH)
  • various fixes in file staging and copying (USC)
  • - reenable stageout of directories
  • - fix confusing email messages on failed stageout
  • - child processes can't use MOM's logging, must use syslog
  • fix overflow in RM netload (USC)
  • don't check walltime on sister nodes, only on MS (ANU)
  • kill_task wasn't being declared properly for all mach types (USC)
  • don't unnecessarily link with libelf and libdl (USC)
  • fix compile warnings with qsort/bsearch on bsd/darwin (USC)
  • fix --disable-filesync to actually work (USC)
  • added prolog diagnostics to 'momctl -d' output (CRI)
  • added logging for job file management (CRI)
  • added mom parameter $ignwalltime (CRI)
  • added $PBS_VNODENUM to job/TM env (USC)
  • fix self-referencing job deps (USC)
  • Use --enable-wordexp to enable variables in data staging (USC)
  • $PBS_HOME/server_name is now used by MOM _iff $pbsserver isn't used_ (USC)
  • Fix TRU64 compile issues (NCIFCRF)
  • Expand job limits up to ULONG_MAX (NCIFCRF)
  • user-supplied TMPDIR no longer treated specially (USC)
  • remtree() now deals with symlinks correctly (USC)
  • enable configurable mail domain (Sandia)
  • configure now handles darwin8 (USC)
  • configure now handles --with-scp=path and --without-scp correctly (USC)

2.0.0p2
  • fix check_pwd() memory leak (USC)

2.0.0p1
  • fix mpiexec stdout regression from 2.0.0p0 (USC)
  • add 'qdel -m' support to enable annotating job cancellation (CRI)
  • add mom diagnostics for prolog failures and timeouts (CRI)
  • interactive jobs cannot be rerunable (USC)
  • be sure nodefile is removed when job is purged (USC)
  • don't run epilogue multiple times when multiple jobs exit at once (USC)
  • fix clearjob MOM request (momctl -c) (USC)
  • fix detection of local output files with localhost or /dev/null (USC)
  • new qstat/qselect -e option to only select jobs in exec queues (USC)
  • $clienthost and $headnode removed, $pbsclient and $pbsserver added (USC)
  • $PBS_HOME/server_name is now added to MOM's server list (USC)
  • resmom transient TMPDIR (USC)
  • add joblist to MOM's status and add server "mom_job_sync" (USC)
  • export PBS_SCHED_HINT to pelogues if set in the job (USC)
  • don't build or install pbs_rcp if --enable-scp (USC)
  • set user hold on submitted jobs with invalid deps (USC)
  • add initial multi-server support for HA (CRI)
  • Altix cpuset enhancements (CSIRO)
  • enhanced momctl to diagnose and report on connectivity issues (CRI)
  • added hostname resolution diagnostics and logging (CRI)
  • fixed 'first node down' rpp failure (USC)
  • improved qsub response time

2.0.0p0
  • torque patches for RCP and resmom (UCHSC)
  • enhanced DIS logging
  • improved start-up to support quick startup with down nodes
  • fixed corrupt job/node/queue API reporting
  • fixed tracejob for large jobs (Sandia)
  • changed qdel to only send one SIGTERM at mom level
  • fixed doc build by adding AIX 5 resources docs
  • added prerun timeout change (RENTEC)
  • added code to handle select() EBADF - 9
  • disabled MOM quota feature by default, enabled with -DTENABLEQUOTA
  • cleanup MOM child error messages (USC)
  • fix makedepend-sh for gcc-3.4 and higher (DTU)
  • don't fallback to mom_rcp if configured to use scp (USC)

TORQUE 1.2

1.2.0p6
  • enabled arch mom config (CRI)
  • fixed qrun based default scheduling to ignore down nodes (USC)
  • disable unsetting of key/integer server parameters (USC)
  • allow FC4 support - quota struct fix (USC)
  • add fix for out of memory failure (USC)
  • add file recovery failure messages (USC)
  • add direct support for external scheduler extensions
  • add passwd file corruption check
  • add job cancel nanny patch (USC)
  • recursively remove job dependencies if children can never be satisfied (USC)
  • make poll_jobs the default behavior with a restat time of 45 seconds
  • added 'shell-use-arg' patch (OSC)
  • improved API timeout disconnect feature
  • added improved rapid start up
  • reworked mom-server state management (USC)
  • - removed 'unknown' state
  • - improved pbsnodes 'offline' management
  • - fixed 'momctl -C' which actually _prevented_ an update
  • - fixed incorrect math on 'tmpTime'
  • - added 'polltime' to the math on 'tmpTime'
  • - consolidated node state changes to new 'update_node_state()'
  • - tightened up the "node state machine"
  • - changed mom's state to follow the documented state guidelines
  • - correctly handle "down" from mom
  • - moved server stream handling out of 'is_update_stat()' to new
  • 'init_server_stream()'
  • - refactored the top of the main loop to tighten up state changes
  • - fixed interval counting on the health check script
  • - forced health check script if update state is forced
  • - don't spam the server with updates on startup
  • - required new addr list after connections are dropped
  • - removed duplicate state updates because of broken multi-server support
  • - send "down" if internal_state is down (aix's query_adp() can do this)
  • - removed ferror() check on fread() because fread() randomly fails on initial
  • mom startup.
  • - send "down" if health check returns "ERROR"
  • - send "down" if disk space check fails.

1.2.0p5
  • make '-t quick' default behavior for qterm
  • added '-p' flag to qdel to enable forced job purge (USC)
  • fixed server resources_available n-1 issue
  • added further Altix CPUSet support (NCSA)
  • added local checkpoint script support for linux
  • fixed 'premature end of message warning'
  • clarify job deleted mail message (SDSC)
  • fixed AIX 5.3 support in configure (WestGrid)
  • fixed crash when qrun issued on job with incomplete requeue
  • added support for >= 4GB memory usage (GMX)
  • log job execution limits failures
  • added more detailed error messages for missing user shell on mom
  • fixed qsub env overflow issue

1.2.0p4
  • extended job prolog to include jobname, resource, queue, and account info (MAINE)
  • added support for Darwin 8/OS X 10.4 (MAINE)
  • fixed suspend/resume for MPI jobs (NORWAY)
  • added support for epilog.precancel to enable local job cancellation handling
  • fixed build for case insensitive filesystems
  • fixed relative path based Makefiles for xpbsmom
  • added support for gcc 4.0
  • added PBSDEBUG support to client commands to allow more verbose diagnostics of client failures
  • added ALLOWCOMPUTEHOSTSUBMIT option to torque.cfg
  • fixed dynamic pbs_server loglevel support
  • added mom-server rpp socket diagnostics
  • added support for multi-homed hosts w/SERVERHOST parameter in torque.cfg
  • added support for static linking w/PBSBINDIR
  • added availmem/totmem support to Darwin systems (MAINE)
  • added netload support to Darwin systems (MAINE)

1.2.0p3
  • enable multiple server to mom communication
  • fixed node reject message overwrite issue
  • enable pre-start node health check (BOEING)
  • fixed pid scanning for RHEL3 (VPAC)
  • added improved vmem/mem limit enforcement and reporting (UMU)
  • added submit filter return code processing to qsub

1.2.0p2
  • enhance network failure messages
  • fixed tracejob tool to only match correct jobs (WESTGRID)
  • modified reporting of linux availmem and totmem to allow larger file sizes
  • fixed pbs_demux for OSF/TRU64 systems to stop orphaned demux processes
  • added dynamic pbs_server loglevel specification
  • added intelligent mom job stat sync'ing for improved scalability (USC/CRI)
  • added mom state sync patch for dup join (USC)
  • added spool dir space check (MAINE)

1.2.0p1
  • add default DEFAULTMAILDOMAIN configure option
  • improve configure options to use pbs environment (USC)
  • use openpty() based tty management by default
  • enable default resource manager extensions
  • make mom config parameters case insensitive
  • added jobstartblocktime mom parameter
  • added bulk read in pbs_disconnect() (USC)
  • added support for solaris 5
  • added support for program args in pbsdsh (USC)
  • added improved task recovery (USC)

1.2.0p0
  • fixed MOM state update behavior (USC/Poland)
  • fixed set_globid() crash
  • added support for > 2GB file size job requirements
  • updated config.guess to 2003 release
  • general patch to initialize all function variables (USC)
  • added patch for serial job TJE leakage (USC)
  • add "hw.memsize" based physmem MOM query for darwin (Maine)
  • add configure option (--disable-filesync) to speed up job submission
  • set PBS mail precedence to bulk to avoid vactaion responses (VPAC)
  • added multiple changes to address gcc warnings (USC)
  • enabled auto-sizing of 'qstat -Q' columns
  • purge DOS EOL characters from submit scripts

TORQUE 1.1

1.1.0p6
  • added failure logging for various MOM job launch failures (USC)
  • allow qsub '-d' relative path qsub specification
  • enabled $restricted parameter w/in FIFO to allow used of non-privileged ports (SAIC)
  • checked job launch status code for retry decisions
  • added nodect resource_available checking to FIFO
  • disabled client port binding by default for darwin systems (use --enable-darwinbind to re-enable)
  • - workaround for darwin bind and pclose OS bugs
  • fixed interactive job terminal control for MAC (NCIFCRF)
  • added support for MAC MOM-level cpu usage tracking (Maine)
  • fixed __P warning (USC)
  • added support for server level resources_avail override of job nodect limits (VPAC)
  • modify MOM copy files and delete file requests to handle NFS root issues (USC/CRI)
  • enhance port retry code to support mac socket behavior
  • clean up file/socket descriptors before execing prolog/epilog
  • enable dynamic cpu set management (ORNL)
  • enable array services support for memory management (ORNL)
  • add server command logging to diagnostics
  • fix linux setrlimit persistance on failures

1.1.0p5
  • added loglevel as MOM config parameter
  • distributed job start sequence into multiple routines
  • force node state/subnode state offline stat synchronization (NCSA)
  • fixed N-1 cpu allocation issue (no sanity checking in set_nodes)
  • enhance job start failure logging
  • added continued port checking if connect fails (rentec)
  • added case insensitive host authentication checks
  • added support for submitfilter command line args
  • added support for relocatable submitfilter via torque.cfg
  • fixed offline status cleared when server restarted (USC)
  • updated PBSTop to 4.05 (USC)
  • fixed PServiceType array to correctly report service messages
  • fixed pbs_server crash from job dependencies
  • prevent mom from truncating lock file when mom is already running
  • tcp timeout added as config option

1.1.0p4
  • added 15004 error logging
  • added use of openpty() call for locating pseudo terminals (SNL)
  • add diagnostic reporting of config and executable version info
  • add support for config push
  • add support for MOM config version parameters
  • log node offline/online and up/down state changes in pbs_server logs
  • add mom fork logging and home directory check
  • add timeout checking in rpp socket handling
  • added buffer overflow prevention routines
  • added lockfile logging
  • supported protected env variables with qstat

1.1.0p3
  • added support for node specification w/pbsnodes -a
  • added hstfile support to momctl
  • added chroot (-D) support (SRCE)
  • added mom chdir pjob check (SRCE)
  • fixed MOM HELLO initialization procedure
  • added momctl diagnostic/admin command (shutdown, reconfig, query, diagnose)
  • added mom job abort bailout to prevent infinite loops
  • added network reinitialization when socket failure detected
  • added mom-to-scheduler reporting when existing job detected
  • added mom state machine failure logging

1.1.0p2
  • add support for disk size reporting via pbs_mom
  • fixed netload initialization
  • fixed orphans on mom fork failure
  • updated to pbstop v 3.9 (USC)
  • fixed buffer overflow issue in net_server.c
  • added pestat package to contrib (ANU)
  • added parameter checking to cpy_stage() (NCSA)
  • added -x (xml output) support for 'qstat -f' and 'pbsnodes -a'
  • added SSS xml library (SSS)
  • updated user-project mapping enforcement (ANL)
  • fix bogus 'cannot find submitfilter' message for interactive jobs
  • fix incorrect job allocation issue for interactive jobs (NCSA)
  • prevent failure with invalid 'servername' specification (NCSA)
  • provide more meaningful 'post processing error' messages (NCSA)
  • check for corrupt jobs in server database and remove them immediately
  • enable SIGUSR1/SIGUSR2 pbs_mom dynamic loglevel adjustment
  • profiling enhancements
  • use local directory variable in scan_non_child_tasks() to prevent race condition (VPAC)
  • added AIX 5 odm support for realmem reporting (VPAC)

1.1.0p1
  • added pbstop to contrib (USC)
  • added OSC mpiexec patch (OSC)
  • confirmed OSC mom-restart patch (OSC)
  • fix pbsd_init purge job tracking
  • allow tracking of completed jobs (w/TORQUEKEEPCOMPLETED env)
  • added support for MAC OS 10
  • added qsub wrapper support
  • added '-d' qsub command line flag for specifying working directory
  • fixed numerous spelling issues in pbs docs
  • enable logical or'ing of user and group ACL's
  • allow large memory sizes for physmem under solaris (USC)
  • fixed qsub SEGV on bad '-o' specification
  • add null checking on ap->value
  • fixed physmem() routine for tru64 systems to load compute node physical memory
  • added netload tracking

1.1.0p0
  • fixed linux swap space checking
  • fixed AIX5 resmom ODM memory leak
  • handle split var/etc directories for default server check (CHPC)
  • add pbs_check utility
  • added TERAGRID nospool log bounds checking
  • add code to force host domains to lower case
  • verified integration of OSC prologue-environment.patch (export Resource_List.nodes in an environment variable for prologue)
  • verified integration of OSC no-munge-server-name.patch (do not install over existing server_name)
  • verified integration of OSC docfix.patch (fix minor manpage type)

TORQUE 1.0

1.0.1p6
  • add messaging to report remote data staging failures to pbs_server
  • added tcp_timeout server parameter
  • add routine to mark hung nodes as down
  • add torque.setup initialization script
  • track okclient status
  • fixed INDIANA ji_grpcache MOM crash
  • fixed pbs_mom PBSLOGLEVEL/PBSDEBUG support
  • fixed pbs_mom usage
  • added rentec patch to mom 'sessions' output
  • fixed pbs_server --help option
  • added OSC patch to allow jobs to survive mom shutdown
  • added patch to support server level node comments
  • added support for reporting of node static resources via sss interface
  • added support for tracking available physical memory for IRIX/Linux systems
  • added support for per node probes to dynamically report local state of arbitrary value
  • fixed qsub -c (checkpoint) usage

1.0.1p5
  • add SUSE 9.0 support
  • add Linux 2.4 meminfo support
  • add support for inline comments in mom_priv/conf
  • allow support for upto 100 million unique jobs
  • add pbs_resources_all documentation
  • fix kill_task references
  • add contrib/pam_authuser

1.0.1p4
  • fixed multi-line readline buffer overflow
  • extended TORQUE documentation
  • fixed node health check management

1.0.1p3
  • added support for pbs_server health check and routing to scheduler
  • added support for specification of more than one clienthost parameter
  • added PW unused-tcp-interrupt patch
  • added PW mom-file-descriptor-leak patch
  • added PW prologue-bounce patch
  • added PW mlockall patch (release mlock for mom children)
  • added support for job names up to 256 chars in length
  • added PW errno-fix patch

1.0.1p2
  • added support for macintosh (darwin)
  • fixed qsub 'usage' message to correctly represent '-j',
  • '-k', '-m', and '-q' support
  • add support for 'PBSAPITIMEOUT' env variable
  • fixed mom dec/hp/linux physmem probes to support 64 bit
  • fixed mom dec/hp/linux availmem probes to support 64 bit
  • fixed mom dec/hp/linux totmem probes to support 64 bit
  • fixed mom dec/hp/linux disk_fs probes to support 64 bit
  • removed pbs server request to bogus probe
  • added support for node 'message' attribute to report internal
  • failures to server/scheduler
  • corrected potential buffer overflow situations
  • improved logging replacing 'unknown' error with real error message
  • enlarged internal tcp message buffer to support 2000 proc systems
  • fixed enc_attr return code checking

1.0.1p1
  • NOTE: See TORQUE distribution CHANGELOG file

1.0.1p0
  • NOTE: See TORQUE distribution CHANGELOG file

See Also