Replacing MRTG continued: data summarization with InfluxDB_

January 12, 2020 @12:00

A while back I began working on replacing MRTG and RRDtool. I have written about the major parts of this previously, but the one feature of RRDtool that I needed to support was the summarization and retention policies. The RRDtool database will automatically consolidate and roll off values stored based on the definitions setup when the database is created. This is used by MRTG to generate the 'Daily' graph with a 5 minute average, the 'Weekly' graph with a 30 minute average, the 'Monthly' graph with a 2 hour average and the 'Yearly' graph with a daily average.

Switchport Dashboard

Based on my needs I chose to leave the 'daily' graph at the telegraph interval (2 minutes) but create a Weekly and Monthly dataset with an hourly and daily summarization. InfluxDB has a pair of features that can be used for this, Continuous Queries, and Retention Policies. They even have a guide that describes what I am trying to accomplish: downsampling and retention. Essentially we can apply an aggregation function on a sliding window of the dataset to create a summarized copy of the dataset. Then the retention policy comes through and drops old measurements out of each dataset based on the configured policy. This allows us to keep the database size manageable while still having long term historic visibility.

To start with, I modified the default autogen retention policy to only retain a month of data. Then I created a three month and one year retention policy to contain the summarized data.

> show retention policies
name        duration  shardGroupDuration replicaN default
----        --------  ------------------ -------- -------
autogen      744h0m0s 168h0m0s           1        true
one_year    8760h0m0s 168h0m0s           1        false
three_month 2160h0m0s 24h0m0s            1        false

With the retention policies in place you can create the continuous queries. Since I am looking at interface data from Telegraf I made sure to preserve the tags used by Grafana while summarizing the values using the mean() aggregation function.

CREATE CONTINUOUS QUERY hourly_summary ON telegraf
     RESAMPLE FOR 2h
     BEGIN SELECT
        last(agent_host) AS agent_host,
        last(hostname) AS hostname,
        last(ifAlias) AS ifAlias,
        last(ifName) AS ifName,
        mean(ifHCInOctets) AS ifHCInOctets,
        mean(ifHCOutOctets) AS ifHCOutOctets
    INTO telegraf.three_month.ifXTable
    FROM telegraf.autogen.ifXTable
    GROUP BY time(1h), * fill(none)
END
CREATE CONTINUOUS QUERY daily_summary ON telegraf
    BEGIN SELECT
        last(agent_host) AS agent_host,
        last(hostname) AS hostname,
        last(ifAlias) AS ifAlias,
        last(ifName) AS ifName,
        mean(ifHCInOctets) AS ifHCInOctets,
        mean(ifHCOutOctets) AS ifHCOutOctets
    INTO telegraf.one_year.ifXTable
    FROM telegraf.autogen.ifXTable
    GROUP BY time(1d), * fill(none)
END

Default Retention Policy Graph

Once the CQs were created and running, the only changes I needed to make to the dashboard was to change the hourly and weekly views from the 'default' retention policy to the three_month or one_year as appropriate.

Three Month Retention Policy

All that is left is to see if this is enough to keep the database down to a manageable size.

Subscribe via RSS. Send me a comment.