[torqueusers] ANNOUNCE: Public release of Ganglia Job Monarch v0.1.0
bastiaans at sara.nl
Fri Mar 10 10:33:10 MST 2006
This is the first initial and public open source release of:
"Ganglia Job Monarch", the Job Monitoring and Archiving tool and is
a addon to Ganglia.
This release is: ganglia_jobmonarch-0.1.0
It is available here:
See the INSTALL file on how to set it up.
Job Monarch is a set of tools to monitor and optionally archive
It is a addon for the Ganglia monitoring system and plugs in to a
existing Ganglia setup.
To view a operational setup with Job Monarch, have a look here:
Job Monarch stands for 'Job Monitoring and Archiving' tool and
consists of three (3) components:
The Job Monitoring Daemon.
Gathers PBS/Torque batch statistics on jobs/nodes and submits
Ganglia's XML stream.
Through this daemon, users are able to view the PBS/Torque batch
system and the
jobs/nodes that are in it (be it either running or queued).
* jobarchived (optionally)
The Job Archiving Daemon.
Listens to Ganglia's XML stream and archives the job and node
It stores the job statistics in a Postgres SQL database and the
in RRD files.
Through this daemon, users are able to lookup a old/finished job
and view all it's statistics.
Optionally: You can either choose to use this daemon if your
users have use for it.
As it can be a heavy application to run and not everyone may
have a need for it.
- Multithreaded: Will not miss any data regardless of (slow)
- Staged writing: Spread load over bigger time periods
- High precision RRDs: Allow for zooming on old periods with
- Timeperiod RRDs: Allow for smaller number of files while
still keeping advantage of small disk space
The Job Monarch web interface.
This interfaces with the jobmond data and (optionally) the
jobarchived and presents the
data and graphs.
It does this in a similar layout/setup as Ganglia itself, so the
navigation and usage is intuitive.
- Graphical usage: Displays graphical cluster overview so you
can see the cluster (job) state
in one view/image and additional pie chart with
relevant information on your
- Filters: Ability to filter output to limit information
displayed (usefull for those
clusters with 500+ jobs). This also filters the
graphical overview images output
and pie chart so you only see the filter relevant data
- Archive: When enabling jobarchived, users can go back
as far as recorded in the database
or archived RRDs to find out what happened to a
crashed or old job
- Zoom ability: Users can zoom into a timepriod as small
as the smallest grain of the RRDS
(typically up to 10 seconds) when a jobarchived is
You can view a operational Ganglia Job Monarch setup here:
Any information/suggestions/hatemail/bugreports/whatever to:
<bastiaans ( a t ) sara ( d o t ) nl>
More information about the torqueusers