JulianMcConnell.com

Automatic Telegraf Configuration Refreshes on Windows

cybersecurity monitoring IT Telegraf InfluxDB 2 Batch Windows Services Task Scheduler

This is a brief look at one way to structure things on Windows so that Telegraf regularly checks in for updated configuration changes from an InfluxDB 2 server.

Background

Telegraf is a fantastic choice for collecting data and pushing it up to a central InfluxDB server. I've used this platform extensively as the basis of building other tools, mostly in systems/network/cybersecurity monitoring. One of the things that can be a drawback is that if one deploys static configurations with the Telegraf binary to client machines, they must be updated manually on those systems as well. There is the ability to have Telegraf pull a specific configuration from the InfluxDB 2 API endpoint, but I noticed a lack of documentation on achieving this in a strategic and repeatable fashion for Windows hosts and decided to develop my own method for making this work with relative ease.

Prerequisites

Process Overview

The way I've developed this process over the past few years is to deploy the latest Telegraf binary with an installation batch/command file which goes through a number of operations to achieve a desired result.

The process in a nutshell is as follows:

More Detail

There is an initial section which makes sure that the script is running as administrator (or elevates it) as well as checks to make sure the script is not running from a UNC path/network drive (to prevent any potential problems). I'll skip going over these bits for now since these are by others. I encourage you to take a look everything and see the attribution links for those portions to study them in more detail.

The script then looks to see if in the same folder it is running from there is a Telegraf binary present (telegraf.exe) to copy to the target folder. It's important to note that it only looks at the filename. If it finds the file, it continues:

rem Here we make sure Telegraf binary is present (note that this is just checking the file name, not hash or anything more definitive)
if exist "%~dp0telegraf.exe" (
    rem Telegraf binary is actually present, so we are continuing
    GOTO CONTINUE
) else (
    rem The file is not present, so let the user know that
    echo Telegraf binary is not present! Please add the latest telegraf.exe binary in the source folder.
    rem Here we pause for the user to take notice of the message and take action
    pause
    rem Now we quit the script
    GOTO QUITME
)

A directory is created:

rem Now we create the directory C:\Program Files\Telegraf\
echo Creating Telegraf Program Files Directory...
md "C:\Program Files\Telegraf\"

Then, the files are copied to the target directory:

rem Next we copy files to directory
echo Copying Files To Telegraf Program Files Directory...
copy "%~dp0telegraf.exe" "C:\Program Files\Telegraf\telegraf.exe"
copy "%~dp0restart_telegraf.cmd" "C:\Program Files\Telegraf\restart_telegraf.cmd"
copy "%~dp0uninstall_telegraf.cmd" "C:\Program Files\Telegraf\uninstall_telegraf.cmd"

Here's where we setup the Influx token to be stored as an environment variable in Windows:

rem Now comes the part where we setup the environment variable for token storage
echo Setting InfluxDB Token Environment Variable...
setx /M INFLUX_TOKEN "TOKEN_VALUE_HERE_BETWEEN_DOUBLE_QUOTES"

For the service installation, we'll pass some parameters along to the binary we copied to then kick it off and point at the config being at the API endpoint URL:

rem Installs service with config set to InfluxDB config URL
echo Installing the Telegraf Windows service...
"C:\Program Files\Telegraf\telegraf.exe" --service install --config "http://endpoint-fqdn-or-ip-address-here:8086/api/v2/telegrafs/012345678abcdef"

Next, we'll go ahead and start the service:

rem Start the telegraf service
echo Starting the Telegraf Windows service...
"C:\Program Files\Telegraf\telegraf.exe" --service start

Here's where the clever bit comes in. Let's setup a scheduled task that runs every hour which kicks off a restart command file that we copied into the installation directory earlier:

rem setup hourly scheduled task for restarting the Telegraf service
echo Setting up hourly scheduled task for regular Telegraf Windows service restarts (configuration retrieval refreshes)
SCHTASKS /CREATE /SC HOURLY /TN "Misc\Restart_Telegraf" /TR "C:\Program Files\Telegraf\restart_telegraf.cmd" /RL HIGHEST /RU SYSTEM /ST 12:00

What this file (restart_telegraf.cmd) does is just restarts the telegraf service. When the scheduled task calls it, the service restarts and a fresh config is pulled from the API endpoint. This means when you modify your config on the Influx side, your Windows clients will check in hourly and pull a fresh copy of the latest config file version. You could change this frequency to be something different if desired, or you could even automate the service restart through another mechanism. The key here is understanding that the service restart should trigger the configuration pull, so regular restarts should result in regular pulls.

I also include an uninstallation batch/command file which can be used later either in a manual or automated fashion to remove the service and accompanying files.

Security

One thing to note is that in this configuration, the InfluxDB token is being stored in the Windows OS as an environment variable. This has obvious security implications due to the token being present in plain text and accessible globally on the system. Unfortunately, there isn't a great solution here if this is a concern on the systems you're deploying to. The only other option I am aware of is to embed the Token in the configuration file. In that regard, choose your own adventure and monitor accordingly from a security standpoint.

More Information

For more information, check out the GitHub repository here.