Installation

Install from the Debian Testing repo. The normal repo can be up to a year behind.

# Add testing repo
echo 'deb http://deb.debian.org/debian testing main' | sudo tee -a /etc/apt/sources.list.d/testing.list

# Pin testing down to a low level so the rest of your packages don't get upgraded
sudo tee -a /etc/apt/preferences.d/not-testing << EOF
Package: *
Pin: release a=testing
Pin-Priority: 50
EOF

# Live Dangerously - this will pull in a lot of packages
sudo apt update
sudo apt install -t testing prometheus

# Observe the two new services
systemctl list-units --type service -q prometheus*

# If this is a VM, the pulled-in openipmi service isn't supported and can be removed
sudo apt remove openipmi

Configuration

Let’s replace the default config with something slightly simpler, then remove anything old.

sudo mv /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.orig
sudo tee /etc/prometheus/prometheus.yml << EOF
scrape_configs:
  - job_name: node
    static_configs:
      - targets: ["localhost:9100"]
EOF
sudo rm -rf /var/lib/prometheus/metrics2/*
sudo systemctl restart prometheus

The ‘scrapes’ are it’s jobs, and it has only one; check out the system at ’localhost:9100’ which happens to be the node exporter service running on itself.

For every target listed, the scraper makes a web request for /metrics/ and stores the results. It ingests all the data presented and stores them for 15 days. You can choose to ignore certain elements or retain differently by adding config, but by default it takes everything given.

Operation

User Interface

You can access the Web UI to see the status of your targets at:

http://some.server:9090/classic/targets

The data can be viewed by selecting Graph at the top, and then choosing from the first dropdown labeled “insert metric at cursor”.

The data is often prefixed with a category. For instance, the node-exporter service is written in the GO language and it provides some stats about that, prefixed with ‘go_’. Information about the system will be prefixed with ’node_’.

A simple Graph tab is available as well.

You can also see the node-exporter service’s output by asking like Prometheus would. Hit it up directly in your browser.

http://some.server:9100/metrics

Data Composition

Data can be simple, like:

go_gc_duration_seconds_sum 3

Or it can be dimensional which is accomplished by adding labels. In the example below, the value of go_gc_duration_seconds has 5 labeled sub-sets.

go_gc_duration_seconds{quantile="0"} 4.5697e-05
go_gc_duration_seconds{quantile="0.25"} 7.814e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103396
go_gc_duration_seconds{quantile="0.75"} 0.000143687
go_gc_duration_seconds{quantile="1"} 0.001030941

In this example, the value of net_conntrack_dialer_conn_failed_total has several.

net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="unknown"} 0

How is this useful? It allows you to do aggregations - such as looking at all the net_contrack failures, and also look at the failures that were specifically refused. All with the same data.

Removing Data

You may have a target you want to remove. Such as a typo hostname that is now causing a large red bar on a dashboard. You can remove that mistake by enabling the admin API and issuing a delete

sudo sed -i 's/^ARGS.*/ARGS="--web.enable-admin-api"/' /etc/default/prometheus

sudo systemctl reload prometheus

curl -s -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="badhost.some.org:9100"}'

The default retention is 15 days. You may want less than that and you can configure --storage.tsdb.retention.time=1d similar to above. ALL data has the same retention, however. If you want historical data you must have a separate instance or use VictoriaMetrics.

Next Steps

Let’s get something interesting to see by adding some OS metrics

Troubleshooting

If you can’t start the prometheus server, it may be an issue with the init file. Test and Prod repos use different defaults. Add some values explicitly to get new versions running

sudo vi /etc/default/prometheus

ARGS="--config.file="/etc/prometheus/prometheus.yml  --storage.tsdb.path="/var/lib/prometheus/metrics2/"

Sources

OpenIPMI

https://medium.com/@MahdiAlimohammadi/why-openipmi-service-failed-on-my-debian-12-server-and-how-i-fixed-it-9c3fef441ad0

Pinning Repos

https://unix.stackexchange.com/questions/647204/aptdefault-release-stable-isnt-sufficient-to-stop-packages-being-automatica


Last modified April 1, 2026: changed name (b6cb643)