I like data. I've been spending some time cleaning up my monitoring and
visualization infrastructure, making sure everything is in Grafana and
available at a glance and I noticed that the one thing that I'm not doing
any collection on is my gaming PC. Now I don't spend as much time as I used
to playing video games but I still want information on how the system is
performing. Most of the tools to measure the performance of a Windows based
gaming system tend to cater towards traditional video gamers, providing
overlays or alerts on screen with the information. That is interesting to
me so I went looking for a way to send that information into my
monitoring platform.
There are a few options for running something like collectd (which I use on all my Linux hosts) on Windows but none of them have been touched in years and one of them is commercial software which seems like overkill for one host. Since I also use Telegraf for network device monitoring I looked around to see if it had plugins to talk to a Windows host. It turns out that it can be run on the Windows host and collect data direct from the operating system interfaces.
Setting up Telegraf
The first thing you need to do is to setup the
Telegraf
agent. This is reasonably easy to do, simply download the zip file and
extract somewhere. I chose C:\Program Files\telegraf\
and created the
config file in C:\Program Files\telegraf\conf\telegraf.conf
. I setup
the agent and outputs section the same way my existing Telegraf agents are
setup (they connect to a container running InfluxDB) and used the default
Windows Performance Counters Input Plugin configuration (more information
here). I was then able to add telegraf as a service and start it.
C:\Program Files\telegraf>telegraf --service install --config 'C:\Program Files\telegraf\conf\telegraf.conf'
and
net start telegraf
Setting up nvidia-smi
Now that Telegraf is running, it would be useful if I could monitor the statistics off my GPU. It turns out that the nVidia driver comes with a SMI interface called nvidia-smi(1) which lets you do just that and the Windows version of Telegraf ships with the plugin required to make that work. All I had to do was add the following to my telegraf.conf and restart the service.
[[inputs.nvidia_smi]]
bin_path = "C:\\Windows\\System32\\nvidia-smi.exe"
I was able to create the following dashboard in about 10 minutes and set about testing it by playing some Cyberpunk 2077. I think it did pretty well for itself.
Setting up Open Hardware Monitor
The next step was a little more difficult. I'm not entirely sure why but it seems that Windows only has very basic support for exposing the system sensor values. There seems to be no good way to gather this data without another application, so I went searching and found Open Hardware Monitor. When running Open Hardware Monitor exports all the system sensor data to WMI which means I can query it from an exec script in Telegraf and have the values sent along to InfluxDB. The Telegraf configuration was updated with the following.
[[inputs.exec]]
commands = ['powershell -executionpolicy bypass -File "C:\\Program Files\\telegraf\\openhardware.ps1"']
data_format = "influx"
And I went and wrote a PowerShell script to gather and format the data for
the exec plugin. It will send all the data from all the senors that Open
Hardware Monitor can see. To install Open Hardware Monitor I simply extracted
the zip file into C:\Program Files\OpenHardwareMonitor\
and told the
program to run at startup and start minimized. The PowerShell script below
went into the Telegraf directory (C:\Program Files\telegraf\
).
# openhardware.ps1 (c) 2021 Matthew J. Ernisse <matt@going-flying.com>
# All Rights Reserved.
#
# Convert sensor value exposed by the OpenHardwareMonitor WMI interface
# to the InfluxDB Line format for use with the telegraf input.exec
# reader.
#
# Redistribution and use in source and binary forms,
# with or without modification, are permitted provided
# that the following conditions are met:
#
# * Redistributions of source code must retain the
# above copyright notice, this list of conditions
# and the following disclaimer.
# * Redistributions in binary form must reproduce
# the above copyright notice, this list of conditions
# and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
# OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
# TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
# USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
$sensors = Get-WmiObject -Namespace root/OpenHardwareMonitor -Query 'SELECT value,name,Parent,SensorType FROM Sensor'
foreach ($sensor in $sensors) {
# By default the Parent property is something like /bus/chip/instance,
# eg: /lpc/nct6793d/ or /intelcpu/0
# convert that into chip_instance for use as a tag on the measurement.
$instance = $sensor.Parent -replace "^/lpc/", ""
$instance = $instance -replace "^/", ""
$instance = $instance -replace "/", "_"
$name = $sensor.name -replace "\s+", "_"
Write-Host -NoNewline "win_hw,host=$env:COMPUTERNAME,instance=$instance,type=$($sensor.SensorType),name=$name value=$($sensor.Value)`n"
}
For a change I plugged in my HOTAS and loaded up some Elite Dangerous and gave the system a little bit of an exersize to see how it all worked.
The Result
So far I'm quite happy with this. The dashboard below has all of the details I want to have to see the health and performance of my gaming pc. I am not sure I would have bothered doing this if I had not been able to integrate it into the existing monitoring platform I have so I'm pretty happy that Telegraf ships a Windows build. I continue to be surprised at the things that Windows doesn't just already have exposed but at least there is already workarounds.
Next time someone asks me why I don't upgrade to the latest and greatest thing I can point to actual data. 🕹