If you have read my previous post about monitoring my ADS-B
receiver it probably won't come as a surprise that the impetus for this
whole project has been to deprecate MRTG from my environment. MRTG was
a fine enough tool when it was basically all we had (though I had rolled
a few iterations of a replacement for personal projects over the years)
but these days it is woefully dated. The biggest issues lie in the
data gathering engine. Even a moderately sized environment is asking for
trouble, dropped polls, and stuck perl processes. MRTG also fails to
provide any information beyond the aggregated traffic statistics.
Years ago I wrote a small script that renders some web pages to display the switchports on the network linked to their MRTG graphs. Each port is enumerated by operational status and description to make it easy to find what you are looking for. It turns out it also makes it pretty easy to throw MRTG out and switch to something else.
I had already settled on Grafana and InfluxDB for a large part of the new monitoring infrastructure with most of the data being collected via collectd running on all my physical and virtual hosts. I am monitoring containers with cAdvisor which also feeds into InfluxDB, so I very much wanted to keep data going into InfluxDB yet I needed something to bridge the gap to the SNMP monitoring that the switches and UPSes in my network require. Enter Telegraf.
My only complaint is that the configuration for the SNMP input module in Telegraf is garbage. It took a bunch of trial and error to figure out the most efficient way to get everything in and working. I do very much like the results though...
Setting up Telegraf as a SNMP agent
There are a number of blog posts kicking around with fragments of information and copy/paste chunks of configuration files but not much in the way of well written documentation. I guess I'll just pile more of the former on.
I deployed Telegraf as a Docker container, though the configuration is largely the same if you deploy directly on a host. I did install all the SNMP MIBs I needed (in Debian, the snmp-mibs-downloader package covered most of them, I added the APC PowerNet MIB for my UPSes and the Synology MIBs for my work NAS) on my Docker host so I could mount them into the container. I pulled the official container and extracted the default configuration file.
docker run --rm telegraf telegraf config > telegraf.conf
With that in hand I set about killing the vast majority of it, leaving only the [agent] section. Since I am only doing SNMP collection the only change I made there was to back the interval off to 120s instead of 10s.
I then configured Telegraf to send metrics to InfluxDB
# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
urls = [ "http://influxdb:8086" ]
database = "telegraf"
skip_database_creation = true
username = "[REDACTED]"
password = "[REDACTED]"
This just left the SNMP input configuration, which I'll break up and describe a bit inline.
[[inputs.snmp]]
agents = [ "sw01.internal.ub3rgeek.net" ]
community = "[REDACTED]"
version = 2
This is pretty self-explanatory, the basic information to poll the agent. You can pass a list into agents and it will use all the same configuration for all of the targets. You can have multiple inputs.snmp stanzas.
[[inputs.snmp.field]]
name = "hostname"
oid = "SNMPv2-MIB::sysName.0"
is_tag = true
This collects the value of the SNMPv2-MIB::sysName.0 OID and makes it available as a tag.
[[inputs.snmp.table]]
inherit_tags = [ "hostname" ]
oid = "IF-MIB::ifXTable"
This is the meat, it walks the IF-MIB::ifXTable and collects all the leaf OIDs as metrics. It inherits the hostname tag from above.
[[inputs.snmp.table.field]]
name = "ifName"
oid = "IF-MIB::ifName"
is_tag = true
[[inputs.snmp.table.field]]
name = "ifDescr"
oid = "IF-MIB::ifDescr"
is_tag = true
[[inputs.snmp.table.field]]
name = "ifAlias"
oid = "IF-MIB::ifAlias"
is_tag = true
These specify additional OIDs to use as tags on the metrics. The difference between this and the hostname tag is that these are scoped to the index in the walk of the IF-MIB::ifXTable, so if you are looking at index 0 in IF-MIB::ifXTable, it will fetch IF-MIB::ifName.0 and use that. I put the configuration and a docker-compose file in Puppet and let the agent crank the wheel and was rewarded with a happy stack of containerized monitoring goodness.
The compose file is below, but I'll leave the configuration management bits up to you, dear reader.
version: '2'
services:
telegraf:
image: registry.hub.docker.com/library/telegraf:latest
environment:
- MIBDIRS=/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf:/usr/share/snmp/mibs/syno
networks:
- grafana_backend
volumes:
- /var/local/docker/data/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /usr/share/snmp/mibs:/usr/share/snmp/mibs:ro
- /var/lib/snmp/mibs/iana:/usr/share/snmp/mibs/iana
- /var/lib/snmp/mibs/ietf:/usr/share/snmp/mibs/ietf
networks:
grafana_backend:
external:
name: grafana_backend
Gluing it to Grafana
The last piece was updating the links to the new graphs. Happily if you setup a variable in a dashboard you can pass it in the URL to the dashboard so I was able to simply change the URL in the template file and regenerate the page.
In my case the new URL was
https://[REDACTED]/grafana/d/[REDACTED]/switch-statistics?var-host=[SWITCH NAME]&var-port=[PORT NAME]
Hopefully this makes it a little clearer if you are trying to achieve a complex SNMP configuration in Telegraf.
🍻