A while back I began working on replacing MRTG and RRDtool. I have written about the major parts of this previously, but the one feature of RRDtool that I needed to support was the summarization and retention policies. The RRDtool database will automatically consolidate and roll off values stored based on the definitions setup when the database is created. This is used by MRTG to generate the 'Daily' graph with a 5 minute average, the 'Weekly' graph with a 30 minute average, the 'Monthly' graph with a 2 hour average and the 'Yearly' graph with a daily average.
Based on my needs I chose to leave the 'daily' graph at the telegraph interval (2 minutes) but create a Weekly and Monthly dataset with an hourly and daily summarization. InfluxDB has a pair of features that can be used for this, Continuous Queries, and Retention Policies. They even have a guide that describes what I am trying to accomplish: downsampling and retention. Essentially we can apply an aggregation function on a sliding window of the dataset to create a summarized copy of the dataset. Then the retention policy comes through and drops old measurements out of each dataset based on the configured policy. This allows us to keep the database size manageable while still having long term historic visibility.
To start with, I modified the default
autogen retention policy to only
retain a month of data. Then I created a three month and one year retention
policy to contain the summarized data.
> show retention policies name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 744h0m0s 168h0m0s 1 true one_year 8760h0m0s 168h0m0s 1 false three_month 2160h0m0s 24h0m0s 1 false
With the retention policies in place you can create the continuous queries. Since I am looking at interface data from Telegraf I made sure to preserve the tags used by Grafana while summarizing the values using the mean() aggregation function.
CREATE CONTINUOUS QUERY hourly_summary ON telegraf RESAMPLE FOR 2h BEGIN SELECT last(agent_host) AS agent_host, last(hostname) AS hostname, last(ifAlias) AS ifAlias, last(ifName) AS ifName, mean(ifHCInOctets) AS ifHCInOctets, mean(ifHCOutOctets) AS ifHCOutOctets INTO telegraf.three_month.ifXTable FROM telegraf.autogen.ifXTable GROUP BY time(1h), * fill(none) END
CREATE CONTINUOUS QUERY daily_summary ON telegraf BEGIN SELECT last(agent_host) AS agent_host, last(hostname) AS hostname, last(ifAlias) AS ifAlias, last(ifName) AS ifName, mean(ifHCInOctets) AS ifHCInOctets, mean(ifHCOutOctets) AS ifHCOutOctets INTO telegraf.one_year.ifXTable FROM telegraf.autogen.ifXTable GROUP BY time(1d), * fill(none) END
Once the CQs were created and running, the only changes I needed to make to
the dashboard was to change the hourly and weekly views from the 'default'
retention policy to the
one_year as appropriate.
All that is left is to see if this is enough to keep the database down to a manageable size.