Grafana
Grafana is an easy to use metrics dashboard. You create and arrange panels that display charts, graphs, and tables. These query data from multiple sources. But mostly influxdb and prometheus.
Installation
# Install the prerequisite packages
sudo apt-get install -y apt-transport-https software-properties-common wget
# Import the GPG key
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
# Updates the list of available packages
sudo apt-get update
# Install the latest Enterprise release
sudo apt-get install grafana-enterprise
Note: Grafana Enterprise is the recommended and default edition1. It’s free, but there are other versions available if desired.
Configuration
The service isn’t enabled or running by default.
sudo systemctl enable --now grafana-server.service
If you want to allow anonymous view access, enable it in the .conf.
sudo vi /etc/grafana/grafana.ini
[auth.anonymous]
# enable anonymous access
enabled = false
# specify role for unauthenticated users
org_role = Viewer
Operation
The Web UI is available on port 3000. The default credentials are “admin/admin”. It will prompt you to change the admin password.
Adding a Data Source
On the left menu:
Connections -> Data Sources -> Add data source
Choose ‘Prometheus’ and use the defaults, the URL being http://localhost:9090 by default. Hit the save and test button at the bottom, and you should see success
Creating a Dashboard
Design and use is a large topic I won’t cover on this page, but here’s some handy queries to refer to.
CPU
Surprisingly, this important stat is one of the hardest to produce.
# Display CPU usage
(100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle", job="someJob"}[5m])) * 100))
# Use the label_replace function shorten the chart labels.
# Turns 'host.subdomain.your.org' to just 'host'
label_replace((100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle", job="screens"}[5m])) * 100)), "instance", "$1", "instance", "(.*).subdomain.*")
Memory
All instances
label_replace(node_memory_MemAvailable_bytes / 1000000, "instance", "$1", "instance", "(.*).sign.*")
Detail on a single instance
Total
node_memory_MemTotal_bytes{instance="some.your.org:9100",job="screens"}
Used
node_memory_MemTotal_bytes{instance="some.your.org:9100",job="screens"} - node_memory_MemFree_bytes{instance="some.your.org:9100",job="screens"}
Cache
node_memory_Cached_bytes{instance="some.your.org:9100",job="screens"} + node_memory_Buffers_bytes{instance="some.your.org:9100",job="screens"} + node_memory_Active_bytes{instance="some.your.org:9100",job="screens"}
State
label_replace(up{job="screens"},"instance", "$1", "instance", "(.*).sign.*")
# set the legend to {{instance}}
Under-voltage
label_replace(node_hwmon_in_lcrit_alarm_volts{job="screens"},"instance", "$1", "instance", "(.*).sign.*")
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.