[torqueusers] ANNOUNCE: Public release of Ganglia Job Monarch v0.1.0

Ramon Bastiaans bastiaans at sara.nl
Fri Mar 10 10:33:10 MST 2006

This is the first initial and public open source release of:

    "Ganglia Job Monarch", the Job Monitoring and Archiving tool and is 
a addon to Ganglia.


This release is:  ganglia_jobmonarch-0.1.0

It is available here:   

See the INSTALL file on how to set it up.


    Job Monarch is a set of tools to monitor and optionally archive 
(batch)job information.

    It is a addon for the Ganglia monitoring system and plugs in to a 
existing Ganglia setup.
    To view a operational setup with Job Monarch, have a look here: 

    Job Monarch stands for 'Job Monitoring and Archiving' tool and 
consists of three (3) components:

    * jobmond

        The Job Monitoring Daemon.
        Gathers PBS/Torque batch statistics on jobs/nodes and submits 
them into
        Ganglia's XML stream.

        Through this daemon, users are able to view the PBS/Torque batch 
system and the
        jobs/nodes that are in it (be it either running or queued).

    * jobarchived (optionally)

        The Job Archiving Daemon.

        Listens to Ganglia's XML stream and archives the job and node 
        It stores the job statistics in a Postgres SQL database and the 
node statistics
        in RRD files.
        Through this daemon, users are able to lookup a old/finished job
        and view all it's statistics.

        Optionally: You can either choose to use this daemon if your 
users have use for it.
        As it can be a heavy application to run and not everyone may 
have a need for it.

        - Multithreaded:    Will not miss any data regardless of (slow) 
        - Staged writing:    Spread load over bigger time periods
        - High precision RRDs:    Allow for zooming on old periods with 
large precision
        - Timeperiod RRDs:    Allow for smaller number of files while 
still keeping advantage of small disk space
    * web

        The Job Monarch web interface.

        This interfaces with the jobmond data and (optionally) the 
jobarchived and presents the
        data and graphs.

        It does this in a similar layout/setup as Ganglia itself, so the 
navigation and usage is intuitive.

        - Graphical usage:    Displays graphical cluster overview so you 
can see the cluster (job) state
                    in one view/image and additional pie chart with 
relevant information on your
                    current view
        - Filters:        Ability to filter output to limit information 
displayed (usefull for those
                    clusters with 500+ jobs). This also filters the 
graphical overview images output
                    and pie chart so you only see the filter relevant data
        - Archive:        When enabling jobarchived, users can go back 
as far as recorded in the database
                    or archived RRDs to find out what happened to a 
crashed or old job
        - Zoom ability:        Users can zoom into a timepriod as small 
as the smallest grain of the RRDS
                    (typically up to 10 seconds) when a jobarchived is 


You can view a operational Ganglia Job Monarch setup here: 


Any information/suggestions/hatemail/bugreports/whatever to:

    Ramon Bastiaans
    <bastiaans ( a t ) sara ( d o t ) nl>

More information about the torqueusers mailing list