This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Prometheus
Overview
Prometheus is a time series database, meaning it’s optimized to store and work with data organized in time order. It includes in it’s single binary:
- Database engine
- Collector
- Simple web-based user interface
This allows you to collect and manage data with fewer tools and less complexity than other solutions.
Data Collection
End-points normally expose metrics to Prometheus by making a web page available that it can poll. This is done by including a instrumentation library (provided by Prometheus) or simply adding a listener on a high-level port that spits out some text when asked.
For systems that don’t support Prometheus natively, there are a few add-on services to translate. These are called ’exporters’ and translate things such as SNMP into a web format Prometheus can ingest.
Alerting
You can also alert on the data collected. This is through the Alert Manager, a second package that works closely with Prometheus.
Visualization
You still need a dashboard tool like Grafana to handle visualizations, but you can get started quite quickly with just Prometheus.
1 - Installation
Install from the Debian Testing repo, as stable can be up to a year behind.
# Testing
echo 'deb http://deb.debian.org/debian testing main' | sudo tee -a /etc/apt/sources.list.d/testing.list
# Pin testing down to a low level so the rest of your packages don't get upgraded
sudo tee -a /etc/apt/preferences.d/not-testing << EOF
Package: *
Pin: release a=testing
Pin-Priority: 50
EOF
# Living Dangerously with test
sudo apt update
sudo apt install -t testing prometheus
Configuration
Use this for your starting config.
cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]
This says every 15 seconds, run down the job list. And there is one job - to check out the system at ’localhost:9090’ which happens to be itself.
For every target listed, the scraper makes a web request for /metrics/ and stores the results. It ingests all the data presented and stores them for 15 days. You can choose to ignore certain elements or retain differently by adding config, but by default it takes everything given.
You can see this yourself by just asking like Prometheus would. Hit it up directly in your browser. For example, Prometheus is making metrics available at /metrics
http://some.server:9090/metrics
Operation
User Interface
You can access the Web UI at:
http://some.server:9090
At the top, select Graph (you should be there already) and in the Console tab click the dropdown labeled “insert metric at cursor”. There you will see all the data being exposed. This is mostly about the GO language it’s written in, and not super interesting. A simple Graph tab is available as well.
Data Composition
Data can be simple, like:
go_gc_duration_seconds_sum 3
Or it can be dimensional which is accomplished by adding labels. In the example below, the value of go_gc_duration_seconds has 5 labeled sub-sets.
go_gc_duration_seconds{quantile="0"} 4.5697e-05
go_gc_duration_seconds{quantile="0.25"} 7.814e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103396
go_gc_duration_seconds{quantile="0.75"} 0.000143687
go_gc_duration_seconds{quantile="1"} 0.001030941
In this example, the value of net_conntrack_dialer_conn_failed_total has several.
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="unknown"} 0
How is this useful? It allows you to do aggregations - such as looking at all the net_contrack failures, and also look at the failures that were specifically refused. All with the same data.
Removing Data
You may have a target you want to remove. Such as a typo hostname that is now causing a large red bar on a dashboard. You can remove that mistake by enabling the admin API and issuing a delete
sudo sed -i 's/^ARGS.*/ARGS="--web.enable-admin-api"/' /etc/default/prometheus
sudo systemctl reload prometheus
curl -s -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="badhost.some.org:9100"}'
The default retention is 15 days. You may want less than that and you can configure --storage.tsdb.retention.time=1d
similar to above. ALL data has the same retention, however. If you want historical data you must have a separate instance or use VictoriaMetrics.
Next Steps
Let’s get something interesting to see by adding some OS metrics
Troubleshooting
If you can’t start the prometheus server, it may be an issue with the init file. Test and Prod repos use different defaults. Add some values explicitly to get new versions running
sudo vi /etc/default/prometheus
ARGS="--config.file="/etc/prometheus/prometheus.yml --storage.tsdb.path="/var/lib/prometheus/metrics2/"
2 - Node Exporter
This is a service you install on your end-points that make CPU/Memory/Etc. metrics available to Prometheus.
Installation
On each device you want to monitor, install the node exporter with this command.
sudo apt install prometheus-node-exporter
Do a quick test to make sure it’s responding to scrapes.
curl localhost:9100/metrics
Configuration
Back on your Prometheus server, add these new nodes as a job in the prometheus.yaml
file. Feel free to drop the initial job where Prometheus was scraping itself.
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'servers'
static_configs:
- targets:
- some.server:9100
- some.other.server:9100
- and.so.on:9100
sudo systemctl reload prometheus.service
Operation
You can check the status of your new targets at:
http://some.server:9090/classic/targets
A lot of data is collected by default. On some low power systems you may want less. For just the basics, customize the the config to disable the defaults and only enable specific collectors.
In the case below we are reduce collection to just CPU, Memory, and Hardware metrics. When scraping a Pi 3B, this reduces the Scrape Duration from 500 to 50ms.
sudo sed -i 's/^ARGS.*/ARGS="--collector.disable-defaults --collector.hwmon --collector.cpu --collector.meminfo"/' /etc/default/prometheus-node-exporter
sudo systemctl restart prometheus-node-exporter
The available collectors are listed on the page:
https://github.com/prometheus/node_exporter
3 - SNMP Exporter
SNMP is one of the most prevalent (and clunky) protocols still widely used on network-attached devices. But it’s a good general-purpose way to get data from lots of different makes of products in a similar way.
But Prometheus doesn’t understand SNMP. The solution is a translation service that acts a a middle-man and ’exports’ data from those devices in a way Prometheus can.
Installation
Assuming you’ve already installed Prometheus, install some SNMP tools and the exporter. If you have an error installing the mibs-downloader, check troubleshooting at the bottom.
sudo apt install snmp snmp-mibs-downloader
sudo apt install -t testing prometheus-snmp-exporter
Change the SNMP tools config file to allow use of installed MIBs. It’s disabled by default.
# The entry 'mibs:' in the file overrides the default path. Comment it out so the defaults kick back in.
sudo sed -i 's/^mibs/# &/' /etc/snmp/snmp.conf
Preparation
We need a target, so assuming you have a switch somewhere and can enable SNMP on it, let’s query the switch for its name, AKA sysName. Here we’re using version “2c” of the protocol with the read-only password “public”. Pretty standard.
Industry Standard Query
There are some well-known SNMP objects you can query, like System Name.
# Get the first value (starting at 0) of the sysName object
snmpget -Oqv -v 2c -c public some.switch.address sysName.0
Some-Switch
# Sometimes you have to use 'getnext' if 0 isn't populated
snmpgetnext -v 2c -c public some.switch.address sysName
Vendor Specific Query
Some vendors will release their own custom MIBs. These provide additional data for things that don’t have well-known objects. Add the MIBs to the system and ‘walk’ the tree to see what’s interesting.
# Unifi, for example
sudo cp UBNT-MIB.txt UBNT-UniFi-MIB.txt /usr/share/snmp/mibs
# snmpwalk doesn't look for enterprise sections by default, so you have to
# look at the MIB and add the specific company's OID number.
grep enterprises UBNT-*
...
UBNT-MIB.txt: ubnt OBJECT IDENTIFIER ::= { enterprises 41112 }
...
snmpwalk -v2c -c public 10.10.202.246 enterprises.41112
Note: If you get back an error or just the ‘iso’ prefixed value, double check the default MIB path.
Configuration
To add this switch to the Prometheus scraper, add a new job to the prometheus.yaml
file. This job will include the targets as normal, but also the path (since it’s different than default) and an optional parameter called module that specific to the SNMP exporter. It also does something confusing - a relabel_config
This is because Prometheus isn’t actually taking to the switch, it’s talking to the local SNMP exporter service. So we put all the targets normally, and then at the bottom ‘oh, by the way, do a switcheroo’. This allows Prometheus to display all the data normally with no one the wiser.
...
...
scrape_configs:
- job_name: 'snmp'
static_configs:
- targets:
- some.switch.address
metrics_path: /snmp
params:
module: [if_mib]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # The SNMP exporter's real hostname:port.
Operation
No configuration on the exporter side is needed. Reload the config and check the target list. Then examine data in the graph section. Add additional targets as needed and the exporter will pull in the data.
http://some.server:9090/classic/targets
These metrics are considered well known and so will appear in the database named sysUpTime and upsBasicBatteryStatus and not be prefixed with snmp_ like you might expect.
Next Steps
If you have something non-standard, or you simply don’t want that huge amount of data in your system, look at the link below to customize the SNMP collection with the Generator.
SNMP Exporter Generator Customization
Troubleshooting
The snmp-mibs-downloader is just a handy way to download a bunch of default MIBs so when you use the tools, all the cryptic numbers, like “1.3.6.1.2.1.17.4.3.1” are translated into pleasant names.
If you can’t find the mibs-downloader its probably because it’s in the non-free repo and that’s not enabled by default. Change your apt sources file like so
sudo vi /etc/apt/sources.list
deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free
deb http://security.debian.org/debian-security bullseye-security main contrib non-free
deb-src http://security.debian.org/debian-security bullseye-security main contrib non-free
deb http://deb.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye-updates main contrib non-free
It may be that you only need to change one line.
4 - SNMP Generator
Installation
There is no need to install the Generator as it comes with the SNMP exporter. But if you have a device that supplies it’s own MIB (and many do), you should add that to the default location.
# Mibs are often named SOMETHING-MIB.txt
sudo cp -n *MIB.txt /usr/share/snmp/mibs/
Preparation
You must identify the values you want to capture. Using snmpwalk
is a good way to see what’s available, but it helps to have a little context.
The data is arranged like a folder structure that you drill-down though. The folder names are all numeric, with ‘.’ instead of slashes. So if you wanted to get a device’s sysName you’d click down through 1.3.6.1.2.1.1.5 and look in the file 0.
When you use snmpwalk
it starts wherever you tell it and then starts drilling-down, printing out everything it finds.
How do you know that’s where sysName is at? A bunch of folks got together (the ISO folks) and decided everything in advance. Then they made some handy files (MIBs) and passed them out so you didn’t have to remember all the numbers.
They allow vendors to create their own sections as well, for things that might not fit anywhere else.
A good place to start is looking at what the vendor made available. You see this by walking their section and including their MIB so you get descriptive names - only the ISO System MIB is included by default.
# The SysobjectID identifies the vendor section
# Note use of the MIB name without the .txt
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address SysobjectID
SNMPv2-MIB::sysObjectID.0 = OID: SOMEVENDOR-MIB::somevendoramerica
# Then walk the vendor section using the name from above
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c some.address somevendoramerica
SOMEVENDOR-MIB::model.0 = STRING: SOME-MODEL
SOMEVENDOR-MIB::power.0 = INTEGER: 0
...
...
# Also check out the general System section
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address system
# You can also walk the whole ISO tree. In some cases,
# there are thousands of entries and it's indecipherable
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.system iso
This can be a lot of information and you’ll need to do some homework to see what data you want to collect.
Configuration
The exporter’s default configuration file is snmp.yml
and contains about 57 Thousand lines of config. It’s designed to pull data from whatever you point it at. Basically, it doesn’t know what device it’s talking to, so it tries to cover all the bases.
This isn’t a file you should edit by hand. Instead, you create instructions for the generator and it look though the MIBs and create one for you. Here’s an example for a Samlex Invertor.
modules:
samlex:
walk:
- sysLocation
- inverterMode
- power
- vin
- tempDD
- tempDA
prometheus-snmp-generator generate
sudo cp /etc/prometheus/snmp.yml /etc/prometheus/snmp.yml.orig
sudo cp ~/snmp.yml /etc/prometheus
sudo systemctl reload prometheus-snmp-exporter.service
Configuration in Prometheus remains the same - but since we picked a new module name we need to adjust that.
...
...
params:
module: [samlex]
...
...
sudo systemctl reload prometheus.service
Adding Data Prefixes
by default, the names are all over the place. The SNMP Exporter Devs leave it this way because there are a lot of pre-built dashboards on downstream systems that expect the existing names.
If you are building your own downstream systems you can prefix (as is best-practice) as you like with a post generation step. This example cases them all to be prefixed with samlex_.
prometheus-snmp-generator generate
sed -i 's/name: /name: samlex_/' snmp.yml
Combining MIBs
You can combine multiple systems in the generator file to create one snmp.yml file, and refer to them by the module name in the Prometheus file.
modules:
samlex:
walk:
- sysLocation
- inverterMode
- power
- vin
- tempDD
- tempDA
ubiquiti:
walk:
- something
- somethingElse
Operation
As before, you can get a preview directly from the exporter (using a link like below). This data should show up in the Web UI too.
http://some.server:9116/snmp?module=samlex&target=some.device
Sources
https://github.com/prometheus/snmp_exporter/tree/main/generator
5 - Sensors DHT
DHT stands for Digital Humidity and Temperature. At less than $5 they are cheap and can be hooked to a Raspberry Pi easily. Add a Prometheus exporter if you want to do at scale.
- Connect the Senor
- Provision and Install the Python Libraries
- Test the Libraries and the Sensor
- Install the Prometheus Exporter as a Service
- Create a Service Account
- Add to Prometheus
Connect The Sensor
These usually come as a breakout-board with three leads you can connect to the Raspberry PI GPIO pins as follows;
- Positive lead to pin 1 (power)
- Negative lead to pin 6 (ground)
- Middle or ‘out’ lead to pin 7 (that’s GPIO 4)

(From https://github.com/rnieva/Playing-with-Sensors---Raspberry-Pi)
Provision and Install
Use the Raspberry Pi Imager to Provision the Pi with Raspberry PI OS Lite 64 bit. Next, install the “adafruit_blinka” library as adapted from their instructions and test.
# General updates
sudo apt update
sudo apt -y upgrade
sudo apt -y autoremove
sudo reboot
# These python components may already be installed, but making sure
sudo apt -y install python3-pip
sudo apt -y install --upgrade python3-setuptools
sudo apt -y install python3-venv
# Make a virtual environment for the python process
sudo mkdir /usr/local/bin/sensor-dht
sudo python3 -m venv /usr/local/bin/sensor-dht --system-site-packages
cd /usr/local/bin/sensor-dht
sudo chown -R ${USER}:${USER} .
source bin/activate
# Build and install the library
pip3 install --upgrade adafruit-python-shell
wget https://raw.githubusercontent.com/adafruit/Raspberry-Pi-Installer-Scripts/master/raspi-blinka.py
sudo -E env PATH=$PATH python3 raspi-blinka.py
Test the Libraries and the Sensor
After logging back in, test the blinka lib.
cd /usr/local/bin/sensor-dht
source bin/activate
wget https://learn.adafruit.com/elements/2993427/download -O blinkatest.py
python3 blinkatest.py
Then install the DHT library from CircuitPython and create a script to test the sensor.
cd /usr/local/bin/sensor-dht
source bin/activate
pip3 install adafruit-circuitpython-dht
vi sensortest.py
import board
import adafruit_dht
dhtDevice = adafruit_dht.DHT11(board.D4)
temp = dhtDevice.temperature
humidity = dhtDevice.humidity
print(
"Temp: {:.1f} C Humidity: {}% ".format(temp, humidity)
)
dhtDevice.exit()
You can get occasional errors like RuntimeError: Checksum did not validate. Try again.
that are safe to ignore. These DHTs are not 100% solid.
Install the Prometheus Exporter as a Service
Add the Prometheus pips.
cd /usr/local/bin/sensor-dht
source bin/activate
pip3 install prometheus_client
And create a script like this.
import board
import adafruit_dht
import time
from prometheus_client import start_http_server, Gauge
dhtDevice = adafruit_dht.DHT11(board.D4)
temperature_gauge= Gauge('dht_temperature', 'Local temperature')
humidity_gauge = Gauge('dht_humidity', 'Local humidity')
start_http_server(8000)
while True:
try:
temperature = dhtDevice.temperature
temperature_gauge.set(temperature)
humidity = dhtDevice.humidity
humidity_gauge.set(humidity)
except:
# Errors happen fairly often as DHT's are hard to read. Just continue on.
continue
finally:
time.sleep(60)
Create a service
sudo nano /lib/systemd/system/sensor.service
[Unit]
Description=Temperature and Humidity Sensing Service
After=network.target
[Service]
Type=idle
Restart=on-failure
User=root
ExecStart=/bin/bash -c 'cd /usr/local/bin/sensor-dht && source bin/activate && python sensor.py'
[Install]
WantedBy=multi-user.target
Enable and start it
sudo systemctl enable --now sensor.service
curl http://localhost:8000/metrics
Create a Service Account
This service is running as root. You should consider creating a sensor account.
sudo useradd --home-dir /usr/local/bin/sensor-dht --system --shell /usr/sbin/nologin --comment "Sensor Service" sensor
sudo usermod -aG gpio sensor
sudo systemctl stop sensor.service
sudo chown -R sensor:sensor /usr/local/bin/sensor-dht
sudo sed -i 's/User=root/User=sensor/' /lib/systemd/system/sensor.service
sudo systemctl daemon-reload
sudo systemctl start sensor.service
Add to Prometheus
Adding it requires logging into your Prometheus server and adding a job like below.
sudo vi /etc/prometheus/prometheus.yml
...
...
- job_name: 'dht'
static_configs:
- targets:
- 192.168.1.45:8000
You will be able to find the node in your server at http://YOUR-SERVER:9090/targets?search=#pool-dht and data will show up with a leading dht_...
.
Sources
https://randomnerdtutorials.com/raspberry-pi-dht11-dht22-python/
You may want to raise errors to the log as in the above source.