Email sent to [email protected] is simply forwarded to someplace like gmail. It’s free and easy, and you don’t need any infrastructure. Most registrars like GoDaddy, NameCheap, CloudFlare, etc, will handle it.

You can even reply from [email protected] by integrating with SendGrid or a similar provider.

Remote-Hosting

If you want more, Google and Microsoft have full productivity suites. Just edit your DNS records, import your users, and pay them $5 a head per month. You still have to ‘do email’ but it’s a little less work than if you ran the whole stack. In most cases, companies that specialize in email do it better than you can.

Self-Hosting

If you are considering local email, let me paraphrase Kenji López-Alt. The first step is, don’t. The big guys can do it cheaper and better. But if it’s a philosophical, control, or you just don’t have the funding, press on.

A Note About Cost

Most of the cost is user support. Hosting means someone else gets purchase and patch a server farm, but you still have to talk to users. My (anecdotal) observation is that fully hosting saves 10% in overall costs and it soothes out expenses. The more users you have, the more that 10% starts to matter.

1.2.1 - Forwarding

This is the best solution for a small number of users. You configure it at your registrar and rely on google (or someone similar) to do all the work for free.

If you want your out-bound emails to come from your domain name (and you do), add an out-bound relay. This is also free for minimal use.

Registrar Configuration

This is different per registrar, but normally involves creating an address and it’s destination

Cloudflare

(Login - assumes you use cloudflare as your registrar)
Login and select the domain in question.
Select Email, then Email Routing.
Under Routes, select Create address.

Once validated, email will begin arriving at the destination.

Configure Relaying

The registrars is only forwarding email, not sending it. To get your sent mail to from from your domain, you must integrate with a mail service such as SendGrid

SendGrid

Create a free account and login
Authenticate your domain name (via DNS)
Create an API key (Settings -> API Keys -> Restricted Access, Defaults)

Gmail

Settings -> Accounts -> Send Mail as
Add your domain email
Configure the SMTP server with:
- SMTP server: “smtp.sendgrid.net”
- username: “apikey”
- password: (the key you created above)

After validating the code Gmail sends you, there will be a drop down in the From field of new emails.

1.2.2 - Remote Hosting

This is more in the software-as-a-service category. You get an admin dashboard and are responsible for managing users and mail flow. The hosting service provide will help you with basic things, but you’re doing most of the work yourself.

Having manged 100K+ user mail systems and migrated from on-prem sendmail to exchange and then O365 and Google, I can confidently say the infrastructure and even platform amounts to less than 10% of the cost of providing the service.

The main advantage to hosting is that you’re not managing the platform, installing patches and replacing hardware. The main disadvantage is is that you have little control and sometimes things are broken and you can’t do anything about it.

Medium sized organizations benefit most from hosting. You probably need a productivity suite anyways, and email is usually wrapped up in that. It saves you from having to specialize someone in email and the infrastructure associated with it.

But if controlling access to your data is paramount, then be aware that you have lost that and treat email as a public conversation.

1.2.3 - Self Hosting

When you self-host, you develop expertise in email itself, arguably a commodity service where such expertise has small return. But, you have full control and your data is your own.

The generally accepted best practice is install Postfix and Dovecot. This is the simplest path and what I cover here. But there are some pretty decent all-in-one packages such as Mailu, Modoboa, etc. These usually wrap Postfix and Dovecot to spare you the details and improve your quality of life, at the cost of not really knowing how they really work.

You’ll also need to configure a relay. Many ISPs block basic mail protocol and many recipient servers are rightly suspicious of random emails from unknown IPs in cable modem land.

1.2.3.1 - Postfix

This is the first step - a server that handles and stores email. You’ll be able to check messages locally at the console. (Remote client access such as with Thunderbird comes later.)

Preparation

You need:

Linux Server
Firewall Port-Forward
Public DNS

Server

We use Debian Bookworm (12) in this example but any derivative will be similar. At large scale you’d setup virtual users, but we’ll stick with the default setup and use your system account. Budget about 10M per 100 emails stored.

Port Forwarding

Mail protocol uses port 25. Simply forward that to your internal mail server and you’re done.

DNS

You need an normal ‘A’ record for your server and a special ‘MX’ record for your domain root. That way, mail sent to [email protected] will get routed to the server.

Name	Type	Value
the-server	A	20.236.44.162
@	MX	the-server

Mail servers see [email protected] and look for records of type ‘MX’ for ‘your.org’. Seeing that ’the-server’ is listed, they lookup it’s ‘A’ record and connect. A message to [email protected] is handled the same way, though when there is no ‘MX’ record it just delivers it to the ‘A’ record for ’the-server.your.org’. If you have both, the ‘MX’ takes precedence.

Installation

Some configuration is done at install time by the package so you must make sure your hostname is correct. We use the hostname ‘mail’ in this example.

# Correct internal hostnames as needed. 'mail' and 'mail.home.lan' are good suggestions.
cat /etc/hostname /etc/hosts

# Set the external host name and run the package installer. If postfix is already installed, apt remove it first
EXTERNAL="mail.your.org"
sudo debconf-set-selections <<< "postfix postfix/mailname string $EXTERNAL"
sudo debconf-set-selections <<< "postfix postfix/main_mailer_type string 'Internet Site'"
sudo apt install --assume-yes postfix

# Add the main domain to the destinations as well
DOMAIN="your.org"
sudo sed -i "s/^mydestination = \(.*\)/mydestination = $DOMAIN, \1/"  /etc/postfix/main.cf
sudo systemctl reload postfix.service

Test with telnet - use your unix system ID for the rcpt address below.

telnet localhost 25

ehlo localhost
mail from: <[email protected]>
rcpt to: <[email protected]>
data
Subject: Wish List

Red Ryder BB Gun
.
quit

Assuming that ‘you’ matches your shell account, Postfix will have accepted the message and used it’s Local Delivery Agent to store it in the local message store. That’s in /var/mail.

cat /var/mail/YOU

Configuration

Encryption

Postfix will use the untrusted “snakeoil” that comes with debian to opportunistically encrypt communication between it and other mail servers. Surprisingly, most other servers will accept this cert (or fall back to non-encrypted), so lets proceed for now. We’ll generate a trusted one later.

Spam Protection

The default config is secured so that it won’t relay messages, but it will accept message from Santa, and is subject to backscatter and a few other things. Let’s tighten it up.

sudo tee -a /etc/postfix/main.cf << EOF

# Tighten up formatting
smtpd_helo_required = yes
disable_vrfy_command = yes
strict_rfc821_envelopes = yes

# Error codes instead of bounces
invalid_hostname_reject_code = 554
multi_recipient_bounce_reject_code = 554
non_fqdn_reject_code = 554
relay_domains_reject_code = 554
unknown_address_reject_code = 554
unknown_client_reject_code = 554
unknown_hostname_reject_code = 554
unknown_local_recipient_reject_code = 554
unknown_relay_recipient_reject_code = 554
unknown_virtual_alias_reject_code = 554
unknown_virtual_mailbox_reject_code = 554
unverified_recipient_reject_code = 554
unverified_sender_reject_code = 554
EOF

sudo systemctl reload postfix.service

PostFix has some recommendations as well.

sudo tee -a /etc/postfix/main.cf << EOF

# PostFix Suggestions
smtpd_helo_restrictions = reject_unknown_helo_hostname
smtpd_sender_restrictions = reject_unknown_sender_domain
smtpd_recipient_restrictions =
    permit_mynetworks, 
    permit_sasl_authenticated,
    reject_unauth_destination
smtpd_relay_restrictions = 
    permit_mynetworks, 
    permit_sasl_authenticated,
    reject_unauth_destination
smtpd_data_restrictions = reject_unauth_pipelining
EOF

sudo systemctl reload postfix.service

If you test a message from Santa now, Postfix will do some checks and realize it’s bogus.

550 5.7.27 [email protected]: Sender address rejected: Domain northpole.org does not accept mail (nullMX)

Header Cleanup

Postfix will attach a Received: header to outgoing emails that has details of your internal network and mail client. That’s information you don’t need to broadcast. You can remove that with a “cleanup” step as the message is sent.

# Insert a header check after the 'cleanup' line in the smtp section of the master file and create a header_checks file
sudo sed -i '/^cleanup.*/a\\t-o header_checks=regexp:/etc/postfix/header_checks' /etc/postfix/master.cf
echo "/^Received:/ IGNORE" | sudo tee -a /etc/postfix/header_checks

Note - there is some debate on if this triggers a higher spam score. You may want to replace instead.

Testing

Incoming

You can now receive mail to [email protected] and [email protected]. Try this to make sure you’re getting messages. Feel free to install mutt if you’d like a better client at the console.

Outgoing

You usually can’t send mail and there are several reasons why.

Many ISPs block outgoing port 25 to keep a lid on spam bots. This prevents you from sending any messages. You can test that by trying to connect to gmail on port 25 from your server.

nc -zv gmail-smtp-in.l.google.com 25

Also, many mail servers will reverse-lookup your IP to see who it belongs to. That request will go to your ISP (who owns the IPs) and show their DNS name instead of yours. You’re often blocked at this step, though some providers will work with you if you contact them.

Even if you’re not blocked and your ISP has given you a static IP with a matching reverse-lookup, you will suffer from a lower reputation score as you’re not a well-known email provider. This can cause your sent messages to be delayed while being considered for spam.

To solve these issues, relay your email though a email provider. This will improve your reputation score (used to judge spam), ease the additional security layers such as SPF, DKIM, DMARC, and is usually free at small volume.

Postfix even calls this using a ‘Smarthost’

Next Steps

Now that you can get email, let’s make it so you can also send it.

Set up a Relay

Troubleshooting

When adding Postfix’s anti-spam suggestions, we left off the smtpd_client_restrictions and smtpd_end_of_data_restrictions as they created problems during testing.

You may get a warning from Postfix that one of the settings you’ve added is overriding one of the earlier settings. Simply delete the first instance. These are usually default settings that we’re overriding.

Use ‘@’ to view the logs from all the related services.

sudo journalctl -u [email protected]

If you change your server’s DNS entry, make sure to update mydestination in your /etc/postfix/main.cf and sudo systemctl reload [email protected].

Misc

Mail Addresses

Postfix only accepts messages for users in the “local recipient table” which is built from the unix password file and the aliases file. You can add aliases for other addresses that will deliver to your shell account, but only shell users can receive mail right now. See virtual mailboxes to add users without shell accounts.

In the alias file, you’ll see “Postmaster” (and possibly others) are aliased to root. Add root as an alias to you at the bottom so that mail gets to your mailbox.

echo "root:   $USER" | sudo tee -a /etc/aliases
sudo newaliases

Spamhaus

They’ve been around for many years and are included in the recommendations from postfix.

smtpd_recipient_restrictions =
    reject_rbl_client zen.spamhaus.org,
    reject_rhsbl_reverse_client dbl.spamhaus.org,
    reject_rhsbl_helo dbl.spamhaus.org,
    reject_rhsbl_sender dbl.spamhaus.org

However, if you’re using one of the large public DNS resolvers, like Google, Cloudflare or Quad 9, this won’t work. The service uses DNS to transmit data and Spamhaus has recently started rejecting ¹ the large providers. You’ll see errors like this in your journal;

sudo journalctl -u [email protected] | grep spam

May 05 14:01:23 mail postfix/smtpd[527]: NOQUEUE: reject: RCPT from i-ii.cloudflare-email.net[104.30.8.88]: 554 5.7.1 Service unavailable; Client host [104.30.8.88] blocked using zen.spamhaus.org; Error: open resolver; https://check.spamhaus.org/returnc/pub/172.70.41.5/;

And the sender wil get return notices like this;

The response from the remote server was:

521 5.3.0 Upstream error 

    and

Rejected reason:
upstream (mail.some.org) error: failed to initialize: Unknown error: permanent error (554): 5.7.1 Service unavailable; Unverified Client host [i-bgb.cloudflare-email.net] blocked using dbl.spamhaus.org; Error: open resolver; https://check.spamhaus.org/returnc/pub/172.71.189.5/

The recommended solution is to create an account with Spamhaus. This works with any DNS resolver. Alternatively, you should be able to use your providers DNS server if you don’t mind them data-mining your traffic. Or deploy your own recursive DNS server, like Unbound. Though neither of these worked reliably for me and I didn’t pursue the reasons.

So sign up for a free account and configure postfix’s domain query service daemon with their instructions.

“…access to public mirrors requires the use of a non-public, non-shared DNS resolver (therefore excluding services like Google Public DNS), while DQS can use any DNS channel” ↩︎

1.2.3.2 - Relay

A relay is simply another mail server that you give your outgoing mail to, rather than try to deliver it yourself.

There are many companies that specialize in this. Sign up for a free account and they give you the block of text to add to your postfix config. Some popular ones are:

SendGrid
MailGun
Sendinblue

They allow anywhere between 50 and 300 a day for free.

SendGrid

Relay Setup

SendGrid’s free plan gives you 50 emails a day. Create an account, verify your email address ([email protected]), and follow the instructions. Make sure to sudo apt install libsasl2-modules

https://docs.sendgrid.com/for-developers/sending-email/postfix

Restart Postfix and use mutt to send an email. It works! the only thing you’ll notice is that your message has a “On Behalf Of” notice in the message letting you know it came from SendGrid. Follow the section below to change that.

Domain Integration

To integrate your domain fully, add DNS records for SendGrid using these instructions.

https://docs.sendgrid.com/ui/account-and-settings/how-to-set-up-domain-authentication

This will require you to login and go to:

Settings -> Sender Authentication -> Domain Authentication

Stick with the defaults that include automatic security and SendGrid will give you three CNAME records. Add those to your DNS and your email will check out.

Technical Notes

DNS

If you’re familiar with email domain-based security, you’ll see that two of the records SendGrid gives you are links to DKIM keys so SendGrid can sign emails as you. The other record (emXXXX) is the host sendgrid will use to send email. The SPF record for that host will include a SendGrid SPF record that includes multiple pools of IPs so that SPF checks will pass. They use CNAMEs on your side so they can rotate keys and pool addresses without changing DNS entries.

If none of this makes sense to you, then that’s really the point. You don’t have to know any of it - they take care of it for you.

Next Steps

Your server can now send email too. All shell users on your sever rejoice!

To actually use your mail server, you’ll want to add some remote client access.

Set up Dovecot

1.2.3.3 - Dovecot

Dovecot is an IMAP (Internet Message Access Protocol) server that allows remote clients to access their mail. There are other protocols and servers, but Dovecot has about 75% of the internet and is a good choice.

Installation

sudo apt install dovecot-imapd
sudo apt install dovecot-submissiond

Configuration

Storage

Both Postfix and Dovecot use mbox storage format by default. This is one big file with all your mail in it and doesn’t scale well. Switch to the newer maildir format where your messages are stored as individual files.

# Change where Postfix delivers mail.
sudo postconf -e "home_mailbox = Maildir/"
sudo systemctl reload postfix.service

# Change where Dovecot looks for mail.
sudo sed -i 's/^mail_location.*/mail_location = maildir:~\/Maildir/' /etc/dovecot/conf.d/10-mail.conf
sudo systemctl reload dovecot.service

Encryption

Dovecot comes with it’s own default cert. This isn’t trusted, but Thunderbird will prompt you and you can choose to accept it. This will be fine for now. We’ll generate a valid cert later.

Credentials

Dovecot checks passwords against the local unix system by default and no changes are needed.

Submissions

One potential surprise is that IMAP is only for viewing existing mail. To send mail, you use the SMTP protocol and relay messages to your mail server. But we have relaying turned off, as we don’t want just anyone relaying messages.

The solution is to enable authentication and by convention this is done by a separate port process, called the Submission Server.

We’ve installed Dovecot’s submission server as it’s newer and easier to set up. Postfix even suggests considering it, rather than theirs. The only configuration needed it to set the localhost as the relay.

# Set the relay as localhost where postfix runs
sudo sed -i 's/#submission_relay_host =/submission_relay_host = localhost/' /etc/dovecot/conf.d/20-submission.conf
sudo systemctl reload dovecot.service

Port Forwarding

Forward ports 143 and 587 to your mail server and test that you can connect from both inside and outside your LAN.

nc -zf mail.your.org 143
nc -zf mail.your.org 587

If it’s working from outside your network, but not inside, you may need to enable [reflection] aka hairpin NAT. This will be different per firewall vendor, but in OPNSense it’s:

Firewall -> Settings -> Advanced

 # Enable these settings
Reflection for port forwards
Reflection for 1:1
Automatic outbound NAT for Reflection

Clients

Thunderbird and others will successfully discover the correct ports and services when you provide your email address of [email protected].

Notes

TLS

Dovecot defaults to port 587 for the submission service which is an older standard for explicit TLS. It’s now recommended by RFC to use implicit TLS on port 465 and you can add a new “submissions” service for that, while leaving the default in place. Clients will pick their fav. Thunderbird defaults to the 465 when both are available.

Note: leaving the default sumbission port commented out just means it will use the default port. Comment out the whole block to disable.

vi /etc/dovecot/conf.d/10-master.conf

# Change the default of

service submission-login {
  inet_listener submission {
    #port = 587
  }
}

to 

service submission-login {
  inet_listener submission {
    #port = 587
  }
  inet_listener submissions {
    port = 465
    ssl = yes
  }
}

# And reload

sudo systemctl reload dovecot.service

Make sure to port forward 465 at the firewall as well

Next Steps

Now that you’ve got the basics working, let’s secure things a little more

Set up security

Sources

https://dovecot.org/list/dovecot/2019-July/116661.html

1.2.3.4 - Security

Certificates

We should use valid certificates. The best way to do that is with the certbot utility.

Certbot

Certbot automates the process of getting and renewing certs, and only requires a brief connection to port 80 as proof it’s you. There’s also a DNS based approach, but we use the port method for simplicity. It only runs once every 60 days so there is little risk of exploit.

Forward Port 80

You probably already have a web server and can’t just change where port 80 goes. To integrate certbot, add a name-based virtual host proxy to that web server.

# Here is a caddy example. Add this block to your Caddyfile
http://mail.your.org {
        reverse_proxy * mail.internal.lan
}

# You can also use a well-known URL if you're already using that vhost
http://mail.your.org {
   handle /.well-known/acme-challenge/ {
     reverse_proxy mail.internal.lan
   }
 }

Install Certbot

Once the port forwarding is in place, you can install certbot and use it to request a certificate. Note the --deploy-hook argument. This reloads services after a cert is obtained or renewed. Else, they’ll keep using an expired one.

DOMAIN=your.org

sudo apt install certbot
sudo certbot certonly --standalone --domains mail.$DOMAIN --non-interactive --agree-tos -m postmaster@$DOMAIN --deploy-hook "service postfix reload; service dovecot reload"

Once you have a cert, Certbot will keep keep it up-to-date by launching periodically from a cronjob in /etc/cron.d and scanning for any needed renewals.

Postfix

Tell Postfix about the cert by using the postconf utility. This will warn you about any potential configuration errors.

sudo postconf -e 'smtpd_tls_cert_file = /etc/letsencrypt/live/mail.$DOMAIN/fullchain.pem'
sudo postconf -e 'smtpd_tls_key_file = /etc/letsencrypt/live/mail.$DOMAIN/privkey.pem'
sudo postfix reload

Dovecot

Change the Dovecot to use the cert as well.

sudo sed -i 's/^ssl_cert = .*/ssl_cert = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/fullchain.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo sed -i 's/^ssl_key = .*/ssl_key = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/privkey.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo dovecot reload

Verifying

You can view the certificates with the commands:

openssl s_client -connect mail.$DOMAIN:143 -starttls imap -servername mail.$DOMAIN
openssl s_client -starttls smtp -showcerts -connect mail.$DOMAIN:587 -servername mail.$DOMAIN

Privacy and Anti-Spam

You can take advantage of Cloudflare (or other) services to accept and inspect your email before forwarding it on to you. As far as the Internet is concerned, Cloudflare is your email server. The rest is private.

Take a look at the Forwarding section, and simply forward your mail to your own server instead of Google’s. That will even allow you to remove your mail server from DNS and drop connections other than CloudFlare if desired.

Intrusion Prevention

In my testing it takes less than an hour before someone discovers and attempts to break into your mail server. You may wish to GeoIP block or otherwise limit connections. You can also use crowdsec.

Crowdsec

Crowdsec is an open-source IPS that monitors your log files and blocks suspicious behavior.

Install as per their instructions.

curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install -y crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables
sudo cscli collections install crowdsecurity/postfix

Postfix

Most services now log to the system journal rather than a file. You can view them with the journalctl command

# What is the exact service unit name?
sudo systemctl status | grep postfix

# Anything having to do with that service unit
sudo journalctl --unit [email protected]

# Zooming into just the identifiers smtp and smtpd
sudo journalctl --unit [email protected] -t postfix/smtp -t postfix/smtpd

Crowdsec accesses the system journal by adding a block to it’s log acquisition directives.

sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
  - "[email protected]"
labels:
  type: syslog
---
EOF

sudo systemctl reload crowdsec

Dovecot

Install the dovecot collection as well.

sudo cscli collections install crowdsecurity/dovecot

sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
  - "_SYSTEMD_UNIT=dovecot.service"
labels:
  type: syslog
---
EOF

sudo systemctl reload crowdsec

Is it working? You won’t see anything at first unless you’re actively under attack. But after 24 hours you may see some examples of attempts to relay spam.

allen@mail:~$ sudo cscli alerts list
╭────┬────────────────────┬────────────────────────────┬─────────┬──────────────────────────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │       value        │           reason           │ country │                      as                      │ decisions │               created_at                │
├────┼────────────────────┼────────────────────────────┼─────────┼──────────────────────────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 60 │ Ip:187.188.233.58  │ crowdsecurity/postfix-spam │ MX      │ 17072 TOTAL PLAY TELECOMUNICACIONES SA DE CV │ ban:1     │ 2023-05-24 06:33:10.568681233 +0000 UTC │
│ 54 │ Ip:177.229.147.166 │ crowdsecurity/postfix-spam │ MX      │ 13999 Mega Cable, S.A. de C.V.               │ ban:1     │ 2023-05-23 20:17:49.912754687 +0000 UTC │
│ 53 │ Ip:177.229.154.70  │ crowdsecurity/postfix-spam │ MX      │ 13999 Mega Cable, S.A. de C.V.               │ ban:1     │ 2023-05-23 20:15:27.964240044 +0000 UTC │
│ 42 │ Ip:43.156.25.237   │ crowdsecurity/postfix-spam │ SG      │ 132203 Tencent Building, Kejizhongyi Avenue  │ ban:1     │ 2023-05-23 01:15:43.87577867 +0000 UTC  │
│ 12 │ Ip:167.248.133.186 │ crowdsecurity/postfix-spam │ US      │ 398722 CENSYS-ARIN-03                        │ ban:1     │ 2023-05-20 16:03:15.418409847 +0000 UTC │
╰────┴────────────────────┴────────────────────────────┴─────────┴──────────────────────────────────────────────┴───────────┴─────────────────────────────────────────╯

If you’d like to get into the details, take a look at the Crowdsec page .

Next Steps

Now that your server is secure, let’s take a look at how email is authenticated and how to ensure yours is.

Authentication

1.2.3.5 - Authentication

Email authentication prevents forgery. People can still send unsolicited email, but they can’t fake who it’s from. If you set up a Relay for Postfix, the relayer is doing it for you. But otherwise, proceed onward to prevent your outgoing mail being flagged as spam.

You need three things

SPF: Server IP addresses - which specific servers have authorization to send email.
DKIM: Server Secrets - email is signed so you know it’s authentic and unchanged.
DMARC: Verifies the address in the From: aligns with the domain sending the email, and what to do if not.

SPF

SPF, or Sender Policy Framework, is the oldest component. It’s a DNS TXT record that lists the servers authorized to send email for a domain.

A receiving server looks at a messages’s return path (aka RFC5321.MailFrom header) to see what domain the email purports to be from. It then looks up that domain’s SPF record and if the server that sent the email isn’t included, the email is considered forged.

Note - this doesn’t check the From: header the user sees. Messages can appear (to the user) to be from anywhere. So it’s is mostly a low-level check to prevent spambots.

The DNS record for your Postfix server should look like:

Type: "TXT"
NAME: "@"
Value: "v=spf1 a:mail.your.org -all"

The value above shows the list of authorized servers (a:) contains mail.your.org. Mail from all other servers is considered forged (-all).

To have your Postfix server check SPF for incoming messages add the SPF policy agent.

sudo apt install postfix-policyd-spf-python

sudo tee -a /etc/postfix/master.cf << EOF

policyd-spf  unix  -       n       n       -       0       spawn
    user=policyd-spf argv=/usr/bin/policyd-spf
EOF

sudo tee -a /etc/postfix/main.cf << EOF

policyd-spf_time_limit = 3600
smtpd_recipient_restrictions =
   permit_mynetworks,
   permit_sasl_authenticated,
   reject_unauth_destination,
   check_policy_service unix:private/policyd-spf
EOF

sudo systemctl restart postfix

DKIM

DKIM, or DomainKeys Identified Mail, signs the emails as they are sent ensuring that the email body and From: header (the one you see in your client) hasn’t been changed in transit and is vouched for by the signer.

Receiving servers see the DKIM header that includes who signed it, then use DNS to check it. Unsigned mail simply isn’t checked. (There is no could-but-didn’t in the standard).

Note - There is no connection between the domain that signs the message and what the user sees in the From: header. Messages can have a valid DKIM signature and still appear to be from anywhere. DKIM is mostly to prevent man-in-the-middle attacks from altering the message.

For Postfix, this requires installation of OpenDKIM and a connection as detailed here. Make sure to sign with the domain root.

https://tecadmin.net/setup-dkim-with-postfix-on-ubuntu-debian/

Once you’ve done that, create the following DNS entry.

Type: "TXT"
NAME: "default._domainkey"
Value: "v=DKIM1; h=sha256; k=rsa; p=MIIBIjANBgkq..."

DMARC

Having a DMARC record is the final piece that instructs servers to check the From: header the user sees against the domain return path from the SPF and DKIM checks, and what to do on a fail.

This means mail “From: [email protected]” sent though mail.your.org mail servers will be flagged as spam.

The DNS record should look like:

Type: "TXT"
NAME: "_dmarc"
Value: "v=DMARC1; p=reject; adkim=s; aspf=r;"

p=reject: Reject messages that fail
adkim=s: Use strict DKIM alignment
aspf=r: Use relaxed SPF alignment

Reject (p=reject) indicates that email servers should “reject” emails that fail DKIM or SPF tests, and skip quarantine.

Strict DKIM alignment (=s) means that the SPF Return-Path domain or the DKIM signing domain must be an exact match with the domain in the From: address. A DKIM signature from your.org would exactly match [email protected].

Relaxed SPF alignment (=r) means subdomains of the From: address are acceptable. I.e. the server mail.your.org from the SPF test aligns with an email from: [email protected].

You can also choose quarantine mode (p=quarantine) or report-only mode (p=none) where the email will be accepted and handled as such by the receiving server, and a report sent to you like below.

v=DMARC1; p=none; rua=mailto:[email protected]

DMARC is an or test. In the first example, if either the SPF or DKIM domains pass, then DMARC passes. You can choose to test one, both or none at all (meaning nothing can pass DMARC) as the the second DMARC example.

To implement DMARC checking in Postfix, you can install OpenDMARC and configure a mail filter as described below.

https://www.linuxbabe.com/mail-server/opendmarc-postfix-ubuntu

Next Steps

Now that you are hadnling email securely and authentically, let’s help ease client connections

Autodiscovery

1.2.3.6 - Autodiscovery

In most cases you don’t need this. Thunderbird, for example, will use a shotgun approach and may find your sever using ‘common’ server names based on your email address.

But there is an RFC and other clients may need help.

DNS SRV

This takes advantage of the RFC with an entry for IMAP and SMTP Submission

Type	Name	Service	Protocol	TTL	Priority	Weight	Port	Target
SRV	@	_imap	TCP	auto	10	5	143	mail.your.org
SRV	@	_submission	TCP	auto	10	5	465	mail.your.org

Web Autoconfig

Create a DNS entry for autoconfig.your.org
Create a vhost and web root for that with the file mail/config-v1.1.xml
Add the contents below to that file

<?xml version="1.0"?>
<clientConfig version="1.1">
    <emailProvider id="your.org">
      <domain>your.org</domain>
      <displayName>Example Mail</displayName>
      <displayShortName>Example</displayShortName>
      <incomingServer type="imap">
         <hostname>mail.your.org</hostname>
         <port>143</port>
         <socketType>STARTTLS</socketType>
         <username>%EMAILLOCALPART%</username>
         <authentication>password-cleartext</authentication>
      </incomingServer>
      <outgoingServer type="smtp">
         <hostname>mail.your.org</hostname>
         <port>587</port>
         <socketType>STARTTLS</socketType> 
         <username>%EMAILLOCALPART%</username> 
         <authentication>password-cleartext</authentication>
         <addThisServer>true</addThisServer>
      </outgoingServer>
    </emailProvider>
    <clientConfigUpdate url="https://www.your.org/config/mozilla.xml" />
</clientConfig>

Note

It’s traditional to match server names to protocols and we would have used “imap.your.org” and “smtp.your.org”. But using ‘mail’ is popular now and it simplifies setup at several levels.

Thunderbird will try to guess at your server names, attempting to connect to smtp.your.org for example. But many Postfix configurations have spam prevention that interfere.

Sources

https://cweiske.de/tagebuch/claws-mail-autoconfig.htm
https://www.hardill.me.uk/wordpress/2021/01/24/email-autoconfiguration/

1.3 - Media

1.3.1 - Players

1.3.1.1 - LibreELEC

One of the best systems for a handling media is LibreELEC. It’s both a Kodi box and a server appliance that’s resistant to abuse. With the right hardware (like a ROCKPro64 or Waveshare) it also makes an excellent portable server for traveling.

Deployment

Download an image from https://libreelec.tv/downloads and flash as directed. Enable SSH during the initial setup.

Storage

RAID is a useful feature but only BTRFS works directly. This is fine, but with a little extra work you can add MergerFS, a popular option for combining disks.

BTRFS

Create the RAID set on another PC. If your disks are of different sizes you can use the ‘single’ profile, but leave the metadata mirrored.

sudo mkfs.btrfs -f -L pool -d single -m raid1 /dev/sda /dev/sdb /dev/etc...

After attaching to LibreELEC, the array will be automatically mounted at /media/pool based on label pool you specified above.

MergerFS

This is a good option if you just want to combine disks and unlike most other RAID technologies, if you loose a disk the rest will keep going. Many people combine this with SnapRAID for off-line parity.

But it’s a bit more work.

Setup MergerFS

Cooling

You may want to manage the fan. The RockPro64 has a PWM fan header and LibreELEC loads the pwm_fan module.

Setup Fan Control

Kodi Manual Start

The kodi process can use a significant amount of CPU even at rest. If you’re using this primarily as a file server you can disable kodi from starting automatically.

cp /usr/lib/systemd/system/kodi.service /storage/.config/system.d/kodi-alt.service
systemctl mask kodi

To start kodi, you can enter systemctl start kodi-alt

Remotes

Plug in a cheap Fm4 style remote and it ‘just works’ with kodi. But if you want to customize some remote buttons, say to start kodi manually, you still can.

Setup Remote Control

Enable SMB

To share your media, simply copy the sample file, remove all the preconfigured shares (unless you want them), and add one for your storage pool. Then just enable Samba and reboot (so the file is picked up)

cp /storage/.config/samba.conf.sample /storage/.config/samba.conf
vi /storage/.config/samba.conf

[media]
  path = /storage/pool
  available = yes
  browseable = yes
  public = yes
  writeable = yes

Config --> LibreELEC --> Services --> Enable Samba

Enable HotSpot

Config --> LibreELEC --> Network --> Wireless Networks

Enable Active and Wireless Access Point and it just works!

Enable Docker

This is a good way handle things like Jellyfin or Plex if you must. In the GUI, go to add-ons, search for the items below and install.

docker
LinuxServer.io
Docker Image Updater

Then you must make sure the docker starts starts after the storage is up or the containers will see an empty folder instead of a mounted one.

vi /storage/.config/system.d/service.system.docker.service

[Unit]
...
...
After=network.target storage-pool.mount

If that fails, you can also tell docker to wait a bit

ExecStartPre=/usr/bin/sleep 120

Remote Management

You may be called upon to look at something remotely. Sadly, there’s no remote access to the GUI but you can use things like autossh to create a persistent remote tunnel, or wireguard to create a VPN connection. Wireguard is usually better.

1.3.1.1.1 - Add-ons

You can also use this platform as a server. This seems counter-intuitive at first; to use a media player OS as a server. But in practice it is rock-solid. I have a mixed fleet of 10 or so devices and LibreELEC has better uptime stats than TrueNAS.

The device playing content on your TV is also the media server for the rest of the house. I wouldn’t advertise this as an enterprise solution, but I can’t dispute the results.

Installation

Normal Add-ons

Common tools like rsync, as well as server software like Jellyfin are available. You can browse as descriped below, or use the search tool if you’re looking for something specific.

Select the gear icon and choose Add-ons
Choose LibreELEC Add-ons
Drill down to browse software.

Docker

If you’re on ARM or want more frequent updates, you may want to add Docker and the LinuxServer.io repository.

Select the gear icon and choose Add-ons
Search add-ons for “Docker” and install
Search add-ons for “LinuxServer.io” and install
Select “Install from repository” and choose “LinuxServer.io’s Docker Add-ons”.

Drill down and add Jellyfin, for example.

https://wiki.libreelec.tv/installation/docker

1.3.1.1.2 - AutoSSH

This allows you to setup and monitor a remote tunnel as the easiest wat to manage remote clients is to let them come to you. To accomplish this, we’ll set up a server, create client keys, test a reverse tunnel, and setup autossh.

The Server

This is simply a server somewhere that everyone can reach via SSH. Create a normal user account with a password and home directory, such as with adduser remote. We will be connecting from our clients for initial setup with this.

The Client

Use SSH to connect to the LibreELEC client, generate a ssh key pair and copy it to the remote server

ssh [email protected]
ssh-keygen  -f ~/.ssh/id_rsa -q -P ""

# ssh-copy-id isn't available so you must use the rather harder command below
cat ~/.ssh/id_rsa.pub | ssh -t [email protected] "cat - >> ~/.ssh/authorized_keys"

ssh [email protected]

If all went well you can back out and then test logging in with no password. Make sure to do this and accept the key so th

The Reverse Tunnel

SSH normally connects your terminal to a remote server. Think of this as a encrypted tunnel where your keystrokes are sent to the server and it’s responses are sent back to you. You can send more than your keystrokes, however. You can take any port on your system and send it as well In our case, we’ll take port 22 (where ssh just happens to be listening) and send it to the rendezvous server on port 2222. SSH will continue to accept local connections while also taking connections from the remote port we are tunneling in.

# On the client, issue this command to connect the (-R)remote port 2222 to localhost:22, i.e. the ssh server on the client
ssh -N -R 2222:localhost:22 -o ServerAliveInterval=240 -o ServerAliveCountMax=2 [email protected]

# Leave that running while you login to the rendezvois server and test if you can now ssh to the client by connecting to the forwarded port.

ssh [email protected]
ssh root@localhost -p 2222

# Now exit both and set up Autossh below

Autossh

Autossh is a daemon that monitors ssh sessions to make sure they’re up and operational, restarting them as needed, and this is exactly what we need to make sure the ssh session from the client stays up. To run this as a service, a systemd service file is needed. For LibreELEC, these are in /storage/.config.

vi /storage/.config/system.d/autossh.service

[Unit]
Description=autossh
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
EnvironmentFile=/storage/.config/autossh
ExecStart=/storage/.kodi/addons/virtual.system-tools/bin/autossh $SSH_OPTIONS
Restart=always
RestartSec=60

[Install]
WantedBy=multi-user.target

vi /storage/.config/autossh

AUTOSSH_POLL=60
AUTOSSH_FIRST_POLL=30
AUTOSSH_GATETIME=0
AUTOSSH_PORT=22034
SSH_OPTIONS="-N -R 2222:localhost:22 [email protected] -i /storage/.ssh/id_rsa"

systemctl enable autossh.service
systemctl start autossh.service
systemctl status autossh.service

At this point, the client has a SSH connection to your server on port 22, opened port 2222 the ssh server and forwarded that back to it’s own ssh server. You can now connect by:

ssh [email protected]
ssh root@localhost -p 2222

If not, check the logs for errors and try again.

journalctl -b 0 --no-pager | less

Remote Control

Now that you have the client connected, you can use your Rendezvous Server as a Jump Host to access things on the remote client such as it’s web interface and even the console via VNC. Your connection will generally take the form of:

ssh localport:libreelec:libreelec_port -J rendezvoisServer  redevoisServer -p autosshPort

The actual command is hard to read as are going through the rendezvois server twice and connecting to localhost on the destination.

ssh -L 8080:localhost:32400  -J [email protected] root@localhost -p 2222

1.3.1.1.3 - Building

This works best in an Ubuntu container.

LibreELECT Notes

Installed but no sata hdd. Found this

RPi4 has zero support for PCIe devices so why is it “embarrasing” for LE to omit support for PCIe SATA things in our RPi4 image?

Feel free to send a pull-request to GitHub enabling the kernel config that’s needed.

https://forum.libreelec.tv/thread/27849-sata-controller-error/

Went though thier resouces beginners guid to git https://wiki.libreelec.tv/development/git-tutorial#forking-and-cloning building basics https://wiki.libreelec.tv/development/build-basics specific build commands https://wiki.libreelec.tv/development/build-commands/build-commands-le-12.0.x

and then failed because jammy wasn’t compatibile enough

Created a jammy container and restarted

https://ubuntu.com/server/docs/lxc-containers

sudo lxc-create –template download –name u1 ubuntu jammy amd64 sudo lxc-start –name u1 –daemon sudo lxc-attach u1

Used some of the notes from

https://www.artembutusov.com/libreelec-raid-support/

Did as fork, clone and a

git fetch –all

but couldnt get all the downloads as alsa.org site was down

On a side note, these are needed in the config.txt so that USB works

otg_mode=1,dtoverlay=dwc2,dr_mode=host

https://www.jeffgeerling.com/blog/2020/usb-20-ports-not-working-on-compute-module-4-check-your-overlays

I tried a menuconfig and selected ..sata? and got

CONFIG_ATA=m < CONFIG_ATA_VERBOSE_ERROR=y < CONFIG_ATA_FORCE=y CONFIG_ATA_SFF=y CONFIG_ATA_BMDMA=y

Better compare the .config file again

Edited and commited a config.txt but it didn’t show up in the image. Possibly the wrong file or theres another way to realize that chagne

Enabled the SPI interface

https://raspberrypi.stackexchange.com/questions/48228/how-to-enable-spi-on-raspberry-pi-3 https://wiki.libreelec.tv/configuration/config_txt

sudo apt install lxc

# This didn't work for some reason
sudo lxc-create --template download --name u1 --dist ubuntu --release jammy --arch amd64

sudo lxc-create --template download --name u1

sudo lxc-start --name u1 --daemon

sudo lxc-attach  u1

# Now inside, build 
apt update
apt upgrade
apt-get install gcc make git wget
apt-get install bc patchutils bzip2 gawk gperf zip unzip lzop g++ default-jre u-boot-tools texinfo xfonts-utils xsltproc libncurses5-dev xz-utils


# login and fork so you can clone more easily. Some problem with the creds

cd
git clone https://github.com/agattis/LibreELEC.tv
cd LibreELEC.tv/
git fetch --all
git tag
git remote add upstream https://github.com/LibreELEC/LibreELEC.tv.git
git fetch --all
git checkout libreelec-12.0
git checkout -b CM4-AHCI-Add
PROJECT=RPi ARCH=aarch64 DEVICE=RPi4   tools/download-tool
ls
cat /etc/passwd 
pwd
ls /home/
ls /home/ubuntu/
ls
cd ..
mv LibreELEC.tv/ /home/ubuntu/
cd /home/ubuntu/
ls -lah
chown -R ubuntu:ubuntu LibreELEC.tv/
ls -lah
cd LibreELEC.tv/
ls
ls -lah
cd
sudo -i -u ubuntu
ip a
cat /etc/resolv.conf 
ip route
sudo -i -u ubuntu


apt install tmux
sudo -i -u ubuntu tmux a




# And back home you can write
ls -lah ls/u1/rootfs/home/ubuntu/LibreELEC.tv/target/

1.3.1.1.4 - Fancontrol

Add this to the /storage/bin and create a service unit.

vi /storage/.config/system.d/fancontrol.service

systemctl enable fancontrol

#!/bin/sh

# Summary
#
# Adjust fan speed by percentage when CPU/GPU is between user set
# Min and Max temperatures.
#
# Notes
#
# Temp can be gleaned from the sysfs termal_zone files and are in
# units millidegrees meaning a reading of 30000 is equal to 30.000 C
#
# Fan speed is read and controlled by he pwm_fan module and can be
# read and set from a sysfs file as well. The value can be set from 0 (off)
# to 255 (max). It defaults to 255 at start


## Set Points

# CPU Temp set points
MIN_TEMP=40 # Min desired CPU temp
MAX_TEMP=60 # Max desired CPU temp


# Fan Speeds set points
FAN_OFF=0       # Fan is off
FAN_MIN=38      # Some fans need a minimum of 15% to start from a dead stop.
FAN_MAX=255     # Max cycle for fan

# Frequency
CYCLE_FREQ=6            # How often should we check, in seconds
SHORT_CYCLE_PERCENT=20  # If we are shutting on or of more than this percent of the
                        # time, just run at min rather than shutting off

## Sensor and Control files

# CPU and GPU sysfs locations
CPU=/sys/class/thermal/thermal_zone0/temp
GPU=/sys/class/thermal/thermal_zone1/temp

# Fan Control files
FAN2=/sys/devices/platform/pwm-fan/hwmon/hwmon2/pwm1
FAN3=/sys/devices/platform/pwm-fan/hwmon/hwmon3/pwm1



## Logic

# The fan control file isn't available until the module loads and
# is unpredictable in path. Wait until it comes up

FAN=""
while [[ -z $FAN ]];do
        [[ -f $FAN2 ]] && FAN=$FAN2
        [[ -f $FAN3 ]] && FAN=$FAN3
        [[ -z $FAN ]] && sleep 1
done

# The sensors are in millidegrees so adjust the user
# set points to the same units

MIN_TEMP=$(( $MIN_TEMP * 1000 ))
MAX_TEMP=$(( $MAX_TEMP * 1000 ))


# Short cycle detection requires us to track the number
# of on-off flips to cycles

CYCLES=0
FLIPS=0

while true; do

        # Set TEMP to the highest GPU/CPU Temp
        TEMP=""
        read TEMP_CPU < $CPU
        read TEMP_GPU < $GPU
        [[ $TEMP_CPU -gt $TEMP_GPU ]] && TEMP=$TEMP_CPU || TEMP=$TEMP_GPU

        # How many degress above or below our min threshold are we?
        DEGREES=$(( $TEMP-$MIN_TEMP ))

        # What percent of the range between min and max is that?
        RANGE=$(( $MAX_TEMP-$MIN_TEMP ))
        PERCENT=$(( (100*$DEGREES/$RANGE) ))

        # What number between 0 and 255 is that percent?
        FAN_SPEED=$(( (255*$PERCENT)/100 ))

        # Override the calculated speed for some special cases
        if [[ $FAN_SPEED -le $FAN_OFF ]]; then                  # Set anything 0 or less to 0
                FAN_SPEED=$FAN_OFF
        elif [[ $FAN_SPEED -lt $FAN_MIN ]]; then                # Set anything below the min to min
                FAN_SPEED=$FAN_MIN
        elif [[ $FAN_SPEED -ge $FAN_MAX ]]; then                # Set anything above the max to max
                FAN_SPEED=$FAN_MAX
        fi

        # Did we just flip on or off?
        read -r OLD_FAN_SPEED < $FAN
        if (    ( [[ $OLD_FAN_SPEED -eq 0 ]] && [[ $FAN_SPEED -ne 0 ]] ) || \
                ( [[ $OLD_FAN_SPEED -ne 0 ]] && [[ $FAN_SPEED -eq 0 ]] ) ); then
                FLIPS=$((FLIPS+1))
        fi

        # Every 10 cycles, check to see if we are short-cycling
        CYCLES=$((CYCLES+1))
        if [[ $CYCLES -ge 10 ]] && [[ ! $SHORT_CYCLING ]]; then
                FLIP_PERCENT=$(( 100*$FLIPS/$CYCLES ))
                if [[ $FLIP_PERCENT -gt $SHORT_CYCLE_PERCENT ]]; then
                        SHORT_CYCLING=1
                        echo "Short-cycling detected. Fan will run at min speed rather than shutting off."
                else
                        CYCLES=0;FLIPS=0
                fi
        fi

        # If we are short-cycling and would turn the fan off, just set to min
        if [[ $SHORT_CYCLING ]] && [[ $FAN_SPEED -le $FAN_MIN ]]; then
                FAN_SPEED=$FAN_MIN
        fi

        # Every so often, exit short cycle mode to see if conditions have changed
        if [[ $SHORT_CYCLING ]] && [[ $CYCLES -gt 10000 ]]; then  # Roughly half a day
                echo "Exiting short-cycling"
                SHORT_CYCLING=""
        fi

        # Write that to the fan speed control file
        echo $FAN_SPEED > $FAN

        # Log the stats everyone once in a while
#       if [[ $LOG_CYCLES ]] && [[ $LOG_CYCLES -ge 10 ]]; then
#               echo "Temp was $TEMP fan set to $FAN_SPEED"
#               LOG_CYCLES=""
#       else
#               LOG_CYCLES=$(($LOG_CYCLES+1))
#       fi

        sleep $CYCLE_FREQ

done

# Also look at drive temps. The sysfs filesystem isn't useful for
# all drives on RockPro64 so use smartctl instead

#ls -1 /dev/sd? | xargs -n1 smartctl -A | egrep ^194 | awk '{print $4}'

1.3.1.1.5 - MergerFS

Prepare and Exempt Disks

Prepare and exempt the file systems from auto-mounting1 so you can supply your own mount options and make sure they are up before you start MergerFS.

Make sure to wipe the disks before using as wipefs and fdisk are not available in LibreELEC.

# Assuming the disks are wiped, format and label each disk the same
mkfs.ext4 /dev/sda 
e2label /dev/sda pool-member

# Copy the udev rule for editing 
cp /usr/lib/udev/rules.d/95-udevil-mount.rules /storage/.config/udev.rules.d
vi /storage/.config/udev.rules.d/95-udevil-mount.rules

Edit this section by adding the pool-member label from above

# check for special partitions we dont want mount
IMPORT{builtin}="blkid"
ENV{ID_FS_LABEL}=="EFI|BOOT|Recovery|RECOVERY|SETTINGS|boot|root0|share0|pool-member", GOTO="exit"

Test this by rebooting and making sure the drives are not mounted.

Add Systemd Mount Units

Each filesystem requires a mount unit like below. Create one for each drive named disk1, disk2, etc. Note: The name of the file is import and to mount /storage/disk1 the name of the file must be storage-disk1.mount

vi /storage/.config/system.d/storage-disk1.mount

[Unit]
Description=Mount sda
Requires=dev-sda.device
After=dev-sda.device

[Mount]
What=/dev/sda
Where=/storage/disk1
Type=ext4
Options=rw,noatime,nofail

[Install]
WantedBy=multi-user.target

systemctl enable --now storage-disk1.mount

Download and Test MergerFS

MergerFS isn’t available as an add-on, but you can get it directly from the developer. LibreELEC (or CoreELEC) on ARM have a 32 bit[^2] user space so you’ll need the armhf version.

wget https://github.com/trapexit/mergerfs/releases/latest/download/mergerfs-static-linux_armhf.tar.gz

tar --extract --file=./mergerfs-static-linux_armhf.tar.gz --strip-components=3 usr/local/bin/mergerfs

mkdir bin
mv mergerfs bin/

Mount the drives and run a test like below. Notice the escaped *. That’s needed at the command line to prevent shell globbing.

mkdir /storage/pool
/storage/bin/mergerfs /storage/disk\* /storage/pool/

Create the MergerFS Service

vi /storage/.config/system.d/mergerfs.service

[Unit]
Description = MergerFS Service
After=storage-disk1.mount storage-disk2.mount storage-disk3.mount storage-disk4.mount
Requires=storage-disk1.mount storage-disk2.mount storage-disk3.mount storage-disk4.mount

[Service]
Type=forking
ExecStart=/storage/bin/mergerfs -o category.create=mfs,noatime /storage/disk* /storage/pool/
ExecStop=umount /storage/pool

[Install]
WantedBy=default.target

systemctl enable --now mergerfs.service

Your content should now be available in /storage/pool after boot.

1.3.1.1.6 - Remotes

Most remotes just work. Newer ones emulate a keyboard and send well-known multimedia keys like ‘play’ and ‘volume up’. If you want to change what a button does, you can tell Kodi what to do pretty easily. In addition, LibreELEC also supports older remotes using eventlircd and popular ones are already configured. You can add unusual ones as well as get normal remotes to perform arbitrary actions when kodi isn’t running (like telling the computer to start kodi or shutdown cleanly).

Modern Remotes

If you plug in a remote receiver and the kernel makes reference to a keyboard you have a modern remote and Kodi will talk to it directly.

dmesg

input: BESCO KSL81P304 Keyboard as ...
hid-generic 0003:2571:4101.0001: input,hidraw0: USB HID v1.11 Keyboard ...

If you want to change a button action, put kodi into log mode, tail the logfile, and press the button in question to see what event is detected.

# Turn on debug
kodi-send -a toggledebug

# Tail the logfile
tail -f /storage/.kodi/temp/kodi.log

   debug <general>: Keyboard: scancode: 0xac, sym: 0xac, unicode: 0x00, modifier: 0x0
   debug <general>: HandleKey: browser_home (0xf0b6) pressed, window 10000, action is ActivateWindow(Home)

In this example, we pressed the ‘home’ button on the remote. That was detected as a keyboard press of the browser_home key. This is just one of many defined keys like ’email’ and ‘calculator’ that can be present on a keyboard. Kodi has a default action of that and you can see what it is in the system keymap

# View the system keyboard map to see what's happening by default
cat /usr/share/kodi/system/keymaps/keyboard.xml

To change what happens, create a user keymap. Any entries in it will override the default.

# Create a user keymap that takes you to 'Videos' instead of 'Home'
vi /storage/.kodi/userdata/keymaps/keyboard.xml

<keymap>
  <global>
    <keyboard>
      <browser_home>ActivateWindow(Videos)</browser_home>
    </keyboard>
  </global>
</keymap>

kodi-send -a reloadkeymaps

Legacy Remotes

How They Work

Some receivers don’t send well-known keys. For these, there’s eventlircd. LibreELEC has a list of popular remotes that fall into this category and will dynamically use it as needed. For instance, pair an Amazon Fire TV remote and udev will fire, match a rule in /usr/lib/udev/rules.d/98-eventlircd.rules, and launch eventlircd with the buttons mapped in /etc/eventlircd.d/aftvsremote.evmap.

These will interface with Kodi using it’s “LIRC” (Linux Infrared Remote Contoll) interface. And just like with keyboards, there’s a set of well-known remote keys Kodi will accept. Some remotes don’t know about these so eventlircd does some pre-translation before relaying to Kodi. If you look in the aftvsremote.evmap file for example, you’ll see that KEY_HOMEPAGE = KEY_HOME.

To find out if your remote falls into this category, enable logging, tail the log, and if your remote has been picked up for handling by eventlircd you’ll see some entries like this.

    debug <general>: LIRC: - NEW 66 0 KEY_HOME devinput (KEY_HOME)
    debug <general>: HandleKey: percent (0x25) pressed, window 10000, action is PreviousMenu

In the first line, Kodi notes that it’s LIRC interface received a KEY_HOME button press. (Eventlircd actually translated it, but that happened before kodi saw anything.) In the second line, Kodi says it received the key ‘percent’, and preformed the action ‘Back’. The part where Kodi says ‘percent (0x25)’ was pressed seems resistent to documentation, but the action of PreviousMenu is the end result. The main question is why?

Turns out that Kodi has a pre-mapping file for events relayed to it from LIRC systems. There’s a mapping for ‘KEY_HOME’ that kodi translates to the well-known key ‘start’. Then Kodi checks the normal keymap file and ‘start’ translates to the Kodi action ‘Back’

Take a look at the system LIRC mapping file to see for yourself.

# The Lircmap file has the Kodi well-known button (start) surrounding the original remote command (KEY_HOME)
grep KEY_HOME /usr/share/kodi/system/Lircmap.xml

      <start>KEY_HOME</start>

Then take a look at the normal mapping file to see how start get’s handled

# The keymap file has the well-known Kodi button surrounding the Kodi action, 
grep start /usr/share/kodi/system/keymaps/remote.xml 

      <start>PreviousMenu</start>

You’ll actually see quite a few things are mapped to ‘start’ as it does different things depending on what part of Kodi you are accessing at the time.

Changing Button Mappings

You have a few options an they are listed here in increasing complexity. Specifically, you can

Edit the keymap
Edit the Lircmap and keymap
Edit the eventlircd evmap

Edit the Keymap

To change what the KEY_HOME button does you can create a user keymap like before and override it. It just needs a changed from keyboard to remote for entering through the LIRC interface. In this example we’ve set it to actually take you home via the kodi function ActivateWindow(Home).

vi /storage/.kodi/userdata/keymaps/remote.xml

<keymap>
  <global>
    <remote>
      <start>ActivateWindow(Home)</start>
    </remote>
  </global>
</keymap>

Edit the Lircmap and Keymap

This can occasionally cause problems though - such as when you have another button that already gets translated to start and you want it to keep working the same. In this case, you make an edit at the Lircmap level to translate KEY_HOME to some other button first, then map that button to the action you want. (You can’t put the Kodi function above in the Lircmap file so you have to do a double hop.)

First, let’s determine what the device name should be with the irw command.

irw

# Hit a button and the device name will be at the end
66 0 KEY_HOME devinput

Now let’s pick a key. My remote doesn’t have a ‘red’ key, so lets hijack that one. Note the device name devinput from the above.

vi /storage/.kodi/userdata/Lircmap.xml

<lircmap>
   <remote device="devinput">
      <red>KEY_HOME</red>
   </remote>
</lircmap>

Then map the key restart kodi (the keymap reload command doesn’t handle Lircmap)

vi /storage/.kodi/userdata/keymaps/remote.xml

<keymap>
  <global>
    <remote>
      <red>ActivateWindow(Home)</red>
    </remote>
  </global>
</keymap>

systemctl restart kodi

Edit the Eventlircd Evmap

You can also change what evenlircd does. If LibreELEC wasn’t a read-only filesystem you’d have done this first. But you can do it with a but more work than the above if you prefer.

# Copy the evmap files
cp -r /etc/eventlircd.d /storage/.config/

# Override where the daemon looks for it's configs
systemctl edit --full eventlircd

# change the ExecStart line to refer to the new location - add vvv to the end for more log info
ExecStart=/usr/sbin/eventlircd -f --evmap=/storage/.config/eventlircd.d --socket=/run/lirc/lircd -vvv

# Restart, replug the device and grep the logs to see what evmap is in use
systemctl restart eventlircd
journalctl | grep evmap

# Edit that map to change how home is mapped (yours may not use the default map)
vi /storage/.config/eventlircd.d/default.evmap

KEY_HOMEPAGE     = KEY_HOME

Dealing With Unknown Buttons

Sometimes, you’ll have a button that does nothing at all.

    debug <general>: LIRC: - NEW ac 0 KEY_HOMEPAGE devinput (KEY_HOMEPAGE)
    debug <general>: HandleKey: 0 (0x0, obc255) pressed, window 10016, action is

In this example Kodi received the KEY_HOMEPAGE button, consulted it’s Lircmap.xml and didn’t find anything. This is because eventlircd didn’t recognize the remote and translate it to KEY_HOME like before. That’s OK, we can just add a user LIRC mapping. If you look through the system file you’ll see things like ‘KEY_HOME’ are tto the ‘start’ button. So let’s do the same.

vi /storage/.kodi/userdata/Lircmap.xml

<lircmap>
   <remote device="devinput">
      <start>KEY_HOMEPAGE</start>
   </remote>
</lircmap>

systemctl restart kodi

Check the log and you’ll see that you now get

    debug <general>: LIRC: - NEW ac 0 KEY_HOMEPAGE devinput (KEY_HOMEPAGE)
    debug <general>: HandleKey: 251 (0xfb, obc4) pressed, window 10025, action is ActivateWindow(Home)

Remotes Outside Kodi

You may want a remote to work outside of kodi too - say because you want to start kodi with a remote button. If you have a modern remote that eventlircd didn’t capture, you must first add your remote to the list of udev rules.

Capture The Remote

First you must identify the remote with lsusb. It’s probably the only non-hub device listed.

lsusb
...
...
Bus 006 Device 002: ID 2571:4101 BESCO KSL81P304
                        ^     ^
Vendor ID -------------/       \--------- Model ID
...

Then, copy the udev rule file and add a custom rule for your remote.

cp /usr/lib/udev/rules.d/98-eventlircd.rules /storage/.config/udev.rules.d/
vi /storage/.config/udev.rules.d/98-eventlircd.rules

...
...
...
ENV{ID_USB_INTERFACES}=="", IMPORT{builtin}="usb_id"

# Add the rule under the above line so the USB IDs are available. 
# change the numbers to match the ID from lsusb

ENV{ID_VENDOR_ID}=="2571", ENV{ID_MODEL_ID}=="4101", \
  ENV{eventlircd_enable}="true", \
  ENV{eventlircd_evmap}="default.evmap"
...

Now, reboot, turn on logging and see what the buttons show up as. You can also install the system tools add-on in kodi, and at the command line, stop kodi and the eventlircd service, then run evtest and press some buttons. You should see something like

Testing ... (interrupt to exit)
Event: time 1710468265.112925, type 4 (EV_MSC), code 4 (MSC_SCAN), value c0223
Event: time 1710468265.112925, type 1 (EV_KEY), code 172 (KEY_HOMEPAGE), value 1
Event: time 1710468265.112925, -------------- SYN_REPORT ------------
Event: time 1710468265.200987, type 4 (EV_MSC), code 4 (MSC_SCAN), value c0223
Event: time 1710468265.200987, type 1 (EV_KEY), code 172 (KEY_HOMEPAGE), value 0
Event: time 1710468265.200987, -------------- SYN_REPORT ------------

Configure and Enable irexec

Now that you have seen the event, you must have the irexec process watching for it to take action. Luckily, LibreELEC already includes it.

vi /storage/.config/system.d/irexec.service

[Unit]
Description=IR Remote irexec config
After=eventlircd.service
Wants=eventlircd.service

[Service]
ExecStart=/usr/bin/irexec --daemon /storage/.lircrc
Type=forking

[Install]
WantedBy=multi-user.target

We’ll create a the config file next. The config is the command or script to run. systemctl start kodi in our case.

vi /storage/.lircrc

begin
    prog   = irexec
    button = KEY_HOMEPAGE
    config = systemctl start kodi
end

Let’s enable and start it up

systemctl enable --now irexec

Go ahead and stop kodi, then press the KEY_HOMEPAGE button on your remote. Try config entries like echo start kodi > /storage/test-results if you have issues and wonder if it’s running.

Notes

You may notice that eventlircd is always running, even if it has no remotes. That’s of a unit file is in /usr/lib/systemd/system/multi-user.target.wants/. I’m not sure of why this is the case when there is no remote in play.

https://discourse.osmc.tv/t/cant-create-a-keymap-for-a-remote-control-button-which-is-connected-by-lircd/88819/6

1.4 - Signage

1.4.1 - Anthias (Screenly)

Overview

Anthias (AKA Screenly) is a simple, open-source digital signage system that runs well on a raspberry pi. When plugged into a monitor, it displays images, video or web sites in slideshow fashion. It’s managed directly though a web interface on the device and there are fleet and support options.

Preparation

Use the Raspberry Pi Imager to create a 64 bit Raspberry Pi OS Lite image. Select the gear icon at the bottom right to enable SSH, create a user, configure networking, and set the locale. Use SSH continue configuration.

setterm --cursor on

sudo raspi-config nonint do_change_locale en_US-UTF-8
sudo raspi-config nonint do_configure_keyboard us
sudo raspi-config nonint do_wifi_country US
sudo timedatectl set-timezone America/New_York
  
sudo raspi-config nonint do_hostname SOMENAME

sudo apt update;sudo apt upgrade -y

sudo reboot

Enable automatic updates and enable reboots

sudo apt -y install unattended-upgrades

# Remove the leading slashes from some of the updates and set to true
sudo sed -i 's/^\/\/\(.*origin=Debian.*\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-New-Unused-Dependencies \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Automatic-Reboot \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades

Installation

bash <(curl -sL https://www.screenly.io/install-ose.sh)

Operation

Adding Content

Navigate to the Web UI at the IP address of the device. You may wish to enter the settings and add authentication and change the device name.

You may add common graphic types, mp4, web and youtube links. It will let you know if it fails to download the youtube video. Some heavy web pages fail to render correctly, but most do.

Images must be sized to for the screen. In most cases this is 1080. Larger images are scaled down, but smaller images are not scaled up. For example, PowerPoint is often used to create slides, but it exports at 720. On a 1080 screen creates black boarders. You can change the resolution on the Pi with rasp-config or add a registry key to Windows to change PowerPoint’s output size.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\PowerPoint\Options]
"ExportBitmapResolution"=dword:00000096

Schedule the Screen

You may want to turn off the display during non-operation hours. The vcgencmd command can turn off video output and some displays will choose to enter power-savings mode. Some displays misbehave or ignore the command, so testing is warranted.

sudo tee /etc/cron.d/screenpower << EOF

# m h dom mon dow usercommand

# Turn monitor on
30 7  * * 1-5 root /usr/bin/vcgencmd display_power 1

# Turn monitor off
30 19 * * 1-5 root /usr/bin/vcgencmd display_power 0

# Weekly Reboot just in case
0 7 * * 1 root /sbin/shutdown -r +10 "Monday reboot in 10 minutes"
EOF

Troubleshooting

YouTube Fail

You may find you must download the video manually and then upload to Anthias. Use the utility yt-dlp to list and then download the mp4 version of a video

yt-dlp --list-formats https://www.youtube.com/watch?v=YE7VzlLtp-4
yt-dlp --format 22 https://www.youtube.com/watch?v=YE7VzlLtp-4

WiFi Disconnect

WiFi can go up and down, and some variants of the OS do not automatically reconnect. You way want to add the following script to keep connected.

sudo touch /usr/local/bin/checkwifi
sudo chmod +x /usr/local/bin/checkwifi
sudo vim.tiny /usr/local/bin/checkwifi

#!/bin/bash

# Exit if WiFi isn't configured
grep -q ssid /etc/wpa_supplicant/wpa_supplicant.conf || exit 

# In the case of multiple gateways (when connected to wired and wireless)
# the `grep -m 1` will exit on the first match, presumably the lowest metric
GATEWAY=$(ip route list | grep -m 1 default | awk '{print $3}')

ping -c4 $GATEWAY > /dev/null

if [ $? != 0 ]
then
  logger checkwifi fail `date`
  service wpa_supplicant restart
  service dhcpcd restart
else
  logger checkwifi success `date`
fi

sudo tee /etc/cron.d/checkwifi << EOF
# Check WiFi connection
*/5 * * * * /usr/bin/sudo -H /usr/local/bin/checkwifi >> /dev/null 2>&1"
EOF

Hidden WiFi

If you didn’t set up WiFi during imaging, you can use raspi-config after boot, but you must add a line if it’s a hidden network, and reboot.

sudo sed -i '/psk/a\        scan_ssid=1' /etc/wpa_supplicant/wpa_supplicant.conf

Wrong IP on Splash Screen

This seems to be captured during installation and then resides statically in this file. Adjust as needed.

# You can turn off the splash screen in the GUI or in the .conf
sed -i 's/show_splash =.*/show_splash = off/' /home/pi/.screenly/screenly.conf

# Or you can correct it in the docker file
vi ./screenly/docker-compose.yml

White Screen or Hung

Anthias works best when the graphics are the correct size. It will attempt to display images that are too large, but this flashes a white screen and eventually hangs the box (at least in the current version). Not all users get the hang of sizing things correctly, so if you have issues, try this script.

#!/bin/bash

# If this device isn't running signage, exit
[ -d /home/pi/screenly_assets ] || { echo "No screenly image asset directory, exiting"; exit 1; }

# Check that mediainfo and imagemagick convert are available
command -v mediainfo || { echo "mediainfo command not available, exiting"; exit 1; }
command -v convert  || { echo "imagemagick convert not available, exiting"; exit 1; }

cd /home/pi/screenly_assets

for FILE in *.png *.jpe *.gif
do
        # if the file doesn't exist, skip this iteration 
        [ -f $FILE ] || continue
        
        # Use mediainfo to get the dimensions at it's much faster than imagemagick              
        read -r NAME WIDTH HEIGHT <<<$(echo -n "$FILE ";mediainfo --Inform="Image;%Width% %Height%" $FILE)

        # if it's too big, use imagemagick's convert. (the mogify command doesn't resize reliably) 
        if [ "$WIDTH" -gt "1920" ] || [ "$HEIGHT" -gt "1080" ]
        then
                echo $FILE $WIDTH x $HEIGHT
                convert $FILE -resize 1920x1080 -gravity center $FILE
        fi
done

No Video After Power Outage

If the display is off when you boot the pi, it may decide there is no monitor. When someone does turn on the display, there is no output. Enable hdmi_force_hotplug in the `/boot/config.txt`` to avoid this problem, and specify the group and mode to 1080 and 30hz.

sed -i 's/.*hdmi_force_hotplug.*/hdmi_force_hotplug=1/' /boot/config.txt
sed -i 's/.*hdmi_group=.*/hdmi_group=2/' /boot/config.txt
sed -i 's/.*hdmi_mode=.*/hdmi_mode=81/' /boot/config.txt

1.4.2 - Anthias Deployment

If you do regular deployments you can create an image. A reasonable approach is to:

Shrink the last partition
Zero fill the remaining free space
Find the end of the last partition
DD that to a file
Use raspi-config to resize after deploying

Or you can use PiShrink to script all that.

Installation

wget https://raw.githubusercontent.com/Drewsif/PiShrink/master/pishrink.sh
chmod +x pishrink.sh
sudo mv pishrink.sh /usr/local/bin

Operation

# Capture and shrink the image
sudo dd if=/dev/mmcblk0 of=anthias-raw.img bs=1M
sudo pishrink.sh anthias-raw.img anthias.img

# Copy to a new card
sudo dd if=anthias.img of=/dev/mmcblk0 bs=1M

If you need to modify the image after creating it you can mount it via loop-back.

sudo losetup --find --partscan anthias.img
sudo mount /dev/loop0p2 /mnt/

# After you've made changes

sudo umount /mnt
sudo losetup --detach-all

Manual Steps

If you have access to a graphical desktop environment, use GParted. It will resize the filesystem and partitions for you quite easily.

# Mount the image via loopback and open it with GParted
sudo losetup --find --partscan anthias-raw.img

# Grab the right side of the last partition with your mouse and 
# drag it as far to the left as you can, apply and exit
sudo gparted /dev/loop0

Now you need to find the last sector and truncate the file after that location. Since the truncate utility operates on bytes, you convert sectors to bytes with multiplication.

# Find the End of the last partition. In the below example, it's Sector *9812664*
$ sudo fdisk -lu /dev/loop0

Units: sectors of 1 * 512 = 512 bytes

Device       Boot  Start     End Sectors  Size Id Type
/dev/loop0p1        8192  532479  524288  256M  c W95 FAT32 (LBA)
/dev/loop0p2      532480 9812664 9280185  4.4G 83 Linux


sudo losetup --detach-all

sudo truncate --size=$[(9812664+1)*512] anthias-raw.img

Very Manual Steps

If you don’t have a GUI, you can do it with a combination of commands.

# Mount the image via loopback
sudo losetup --find --partscan anthias-raw.img

# Check and resize the file system
sudo e2fsck -f /dev/loop0p2
sudo resize2fs -M /dev/loop0p2

... The filesystem on /dev/loop0p2 is now 1149741 (4k) blocks long

# Now you can find the end of the resized filesystem by:

# Finding the number of sectors.
#     Bytes = Num of blocks * block size
#     Number of sectors = Bytes / sector size
echo $[(1149741*4096)/512]

# Finding the start sector (532480 in the example below)
sudo fdisk -lu /dev/loop0

Device       Boot  Start      End  Sectors  Size Id Type
/dev/loop0p1        8192   532479   524288  256M  c W95 FAT32 (LBA)
/dev/loop0p2      532480 31116287 30583808 14.6G 83 Linux

# Adding the number of sectors to the start sector. Add 1 because you want to end AFTER the end sector
echo $[532480 + 9197928 + 1]

# And resize the part to that end sector (ignore the warnings)
sudo parted resizepart 2 9730409

Great! Now you can follow the remainder of the GParted steps to find the new last sector and truncate the file.

Extra Credit

It’s handy to compress the image. xz is pretty good for this

xz anthias-raw.img

xzcat anthias-raw.img | sudo dd of=/dev/mmcblk0

In these procedures, we make a copy of the SD card before we do anything. Another strategy is to resize the SD card directly, and then use dd and read in X number of sectors rather than read it all in and then truncate it. A bit faster, if a but less recoverable from in the event of a mistake.

1.4.3 - API

The API docs on the web refer to screenly. Anthias uses an older API. However, you can access the API docs for the version your working with at

http://sign.your.domain/api/docs/

You’ll have to correct the swagger form with correct URL, but after that you can see what you’re working with.

1.5 - Web

1.5.1 - Access Logging

1.5.1.1 - GoAccess

GoAccess is a lightweight web stats visualizer than can display in a terminal window or in a browser. It supports Caddy’s native JSON log format and can also be run as a real-time service with a little work.

Installation

If you have Debian/Ubuntu, you can add the repo as the [official docs] show.

# Note: The GoAccess docs use the `lsb_release -cs` utility that some Debains don't have, so I've substituted the $VERSION_CODENAME variable from the os-release file
wget -O - https://deb.goaccess.io/gnugpg.key | gpg --dearmor | sudo tee /usr/share/keyrings/goaccess.gpg >/dev/null 
source /etc/os-release 
echo "deb [signed-by=/usr/share/keyrings/goaccess.gpg arch=$(dpkg --print-architecture)] https://deb.goaccess.io/ $VERSION_CODENAME main" | sudo tee /etc/apt/sources.list.d/goaccess.list
sudo apt update
sudo apt install goaccess

Basic Operation

No configuration is needed if your webserver is logging in a supported format¹. Though you may need to adjust file permissions so the log file can be read by the user running GoAccess.

To use in the terminal, all you need to is invoke it with a couple parameters.

sudo goaccess --log-format CADDY --log-file /var/log/caddy/access.log

To produce a HTML report, just add an output file somewhere your web server can find it.

sudo touch  /var/www/some.server.org/report.html
sudo chown $USER /var/www/some.server.org/report.html
goaccess --log-format=CADDY --output /var/www/some.server.org/report.html --log-file /var/log/caddy/access.log

Retaining History

History is useful and GoAccess lets you persist your data and incorporate it on the next run. This updates your report rather than replace it.

# Create a database location
sudo mkdir -p /var/lib/goaccess-db
sudo chown $USER /var/lib/goaccess-db

goaccess \
    --persist --restore --db-path /var/lib/goaccess-db \
    --log-format CADDY --output /var/www/some.server.org/report.html --log-file /var/log/caddy/access.log

GeoIP

To display country and city you need GeoIP databases, preferably the GeoIP2 format. An easy way to get started is with DB-IP’s Lite GeoIP2 databases that have a permissive license.

# These are for Jan 2025. Check https://db-ip.com/db/lite.php for the updated links
wget --directory-prefix /var/lib/goaccess-db \
    https://download.db-ip.com/free/dbip-asn-lite-2025-01.mmdb.gz \
    https://download.db-ip.com/free/dbip-city-lite-2025-01.mmdb.gz \
    https://download.db-ip.com/free/dbip-country-lite-2025-01.mmdb.gz  
    
gunzip /var/lib/goaccess-db/*.gz

goaccess \
    --persist --restore --db-path /var/lib/goaccess-db \
    --geoip-database /var/lib/goaccess-db/dbip-asn-lite-2025-01.mmdb  \
    --geoip-database /var/lib/goaccess-db/dbip-city-lite-2025-01.mmdb  \
    --geoip-database /var/lib/goaccess-db/dbip-country-lite-2025-01.mmdb \
    --log-format CADDY --output /var/www/some.server.org/report.html --log-file /var/log/caddy/access.log

This will add a Country and ASN panel, and populate the city column on the “Visitor Hostnames and IPs” panel.

This one-time download is “good enough” for most purposes. But if you want to automate updates of the GeoIP data, you can create an account with a provider and get an API key. Maxmind offers free accounts. Sign up here and you can sudo apt install geoipupdate to get regular updates.

Automation

A reasonable way to automate is with a logrotate hook. Most systems already have this in place to handle their logs so it’s an easy add. If you’re using Apache or nginx you probably already have one that you can just add a prerotate hook to. For Caddy, something like this should be added.

sudo vi  /etc/logrotate.d/caddy

/var/log/caddy/access.log {
    daily
    rotate 7
    compress
    missingok
    prerotate
        goaccess \
        --persist --restore --db-path /var/lib/goaccess-db \
        --geoip-database /var/lib/goaccess-db/dbip-asn-lite-2025-01.mmdb  \
        --geoip-database /var/lib/goaccess-db/dbip-city-lite-2025-01.mmdb  \
        --geoip-database /var/lib/goaccess-db/dbip-country-lite-2025-01.mmdb \
        --log-format CADDY --output /var/www/some.server.org/report.html --log-file /var/log/caddy/access.log
    endscript
    postrotate
        systemctl restart caddy
    endscript
}

You can test this works with force option. If it works, you’ll be left with updated stats, an empty access.log, and a newly minted access.log.1.gz

sudo logrotate --force /etc/logrotate.d/caddy

For Caddy, you’ll also need to disable it’s built-in log rotation

sudo vi /etc/caddy/Caddyfile


log  { 
    output file /path/to/your/log.log {
        roll_disabled
    }
}

sudo systemctl restart caddy

Of course, this runs as root. You can design that out, but you can also configure systemd to trigger a timed execution and run as non-root. Arnaud Rebillout has some good info on that.

Real-Time Service

You can also run GoAccess as a service for real-time log analysis. It works in conjunction with your web server by providing a Web Socket to push the latest data to the browser.

In this example, Caddy is logging vhosts to individual files as described in the Caddy logging page. This is convenient as it allows you view vhosts separately which is often desired. Adjust as needed.

Prepare The System

# Create a system user with no home
sudo adduser --system --group --no-create-home goaccess

# Add the user to whatever group the logfiles are set to (ls -lah /var/log/caddy/ or whatever)
sudo adduser goaccess caddy

# Put the site in a variable to make edits easier
SITE=www.site.org

# Create a database location
sudo mkdir -p /var/lib/goaccess-db/$SITE
sudo chown goaccess:goaccess /var/lib/goaccess-db/$SITE

# Possibly create the report location
sudo touch /var/www/$SITE/report-$SITE.html
sudo chown goaccess /var/www/$SITE/report-$SITE.html

# Test that the goaccess user can create the report
sudo -u goaccess goaccess \
    --log-format CADDY \
    --output /var/www/$SITE/report-$SITE.html \
    --persist \
    --restore \
    --db-path /var/lib/goaccess-db/$SITE \
    --log-file /var/log/caddy/$SITE.log

Create The Service

sudo vi /etc/systemd/system/goaccess.service

[Unit]
Description=GoAccess Web log report
After=network.target

# Note: Variable braces are required for sysmtemd variable expansion
[Service]
Environment="SITE=www.site.org"
Type=simple
User=goaccess
Group=goaccess
Restart=always
ExecStart=/usr/bin/goaccess \
    --log-file /var/log/caddy/${SITE}.log \
    --log-format CADDY \
    --persist \
    --restore \
    --db-path /var/lib/goaccess-db/${SITE} \
    --output /var/www/${SITE}/report-${SITE}.html \
    --real-time-html
StandardOutput=null
StandardError=null

[Install]
WantedBy=multi-user.target

sudo systemctl enable --now goaccess.service
sudo systemctl status goaccess.service
sudo journalctl -u  goaccess.service

If everything went well, it should be listening on the default port of 7890

nc -zv localhost 7890
localhost [127.0.0.1] 7890 (?) open

BUT you can’t access that port unless you’re on the same LAN. You can start forwarding that port and even setup SSL in the WS config, but in most cases it’s better to handle it with a proxy.

Configure Access via Proxy

To avoid adding additional port forwarding you can convert the websocket connection from a high-level port to a proxy path. This works with cloudflare as well.

Edit your GoAccess service unit to indicate the proxied URL.

ExecStart
    ...
    # Note the added escape slash the the formerly last line
    --real-time-html \  
    --ws-url wss://www.site.org:443/ws/goaccess 
    # If you don't add the port explicitly, GoAccess
    # will 'helpfully' add the internal port (which isn't helpful), silently.

Add a proxy line to your web server. If using Caddy, add a path handler and proxy like this, and restart.

some.server.org {
        ...
        ...
        handle_path /ws/goaccess* {
                reverse_proxy * http://localhost:7890
        }
}

Take your browser to the URL, and you should see the gear icon top left now has a green dot under it.

Image of WebSocket indicator

If the dot isn’t green you’re not connected so take a look at the troubleshooting section.

Create Multiple Services vs Multiple Reports

When you have lots of vhosts its useful to separate them at the log level and report separately. To do that you can use a systemd template so as to create multiple instances. Arnaud Rebillout has some details on that.

But scaling that becomes onerous. My preference is to automate report generation more frequently and skip the realtime.

Troubleshooting

No Data

Check the permissions on the files. If you accidentally typed caddy start as root it will be running as root and later runs may not be able to write log entries.

GoAccess Isn’t Live

Your best bet is to open the developer tools in your browser, check the network tab and refresh. If proxying is wrong it will give you some hints.

What About The .conf

The file /etc/goaccess/goaccess.conf can be used, just make sure to remove the arguments from the unit file so there isn’t a conflict.

Web Socket Intermittent

Some services, such as Cloudflare, support WS but can cause intermittent disconnections. Try separating your stats from your main traffic site.

See https://goaccess.io/man –log-format options ↩︎

1.5.2 - Content

1.5.2.1 - Content Mgmt

There are many ways to manage and produce web content. Traditionally, you’d use a large application with roles and permissions.

A more modern approach is to use a distributed version control system, like git, and a site generator.

Static Site Generators are gaining popularity as they produce static HTML with javascript and CSS that can be deployed to any Content Delivery Network without need for server-side processing.

Astro is great, as is Hugo, with the latter being around longer and having more resources.

1.5.2.1.1 - Hugo

Hugo is a Static Site Generator (SSG) that turns Markdown files into static web pages that can be deployed anywhere.

Like WordPress, you apply a ’theme’ to style your content. But rather than use a web-inteface to create content, you directly edit the content in markdown files. This lends itself well tomanaging the content as code and appeals to those who prefer editing text.

However, unlike other SSGs, you don’t have to be a front-end developer to get great results and you can jump in with a minimal investment of time.

1.5.2.1.1.1 - Hugo Install

Requirements

I use Debian in this example, but any apt-based distro will be similar.

Preparation

Enable and pin the Debian Backports and Testing repos so you can get recent versions of Hugo and needed tools.

–> Enable and Pin

Installation

Hugo requires git and go

# Assuming you have enable backports as per above
sudo apt install -t bullseye-backports git
sudo apt install -t bullseye-backports golang-go

For a recent version of Hugo you’ll need to go to the testing repo. The extended version is recommended by Hugo and it’s chosen by default.

# This pulls in a number of other required packages, so take a close look at the messages for any conflicts. It's normally fine, though. 
sudo apt install -t testing  hugo

In some cases, you can just install from the Debian package with a lot less effort. Take a look at latest and copy the URL into a wget.

https://github.com/gohugoio/hugo/releases/latest

wget https://github.com/gohugoio/hugo/releases/download/v0.124.1/hugo_extended_0.124.1_linux-amd64.deb

Configuration

A quick test right from the quickstart page to make sure everything works

hugo new site quickstart
cd quickstart
git init
git submodule add https://github.com/theNewDynamic/gohugo-theme-ananke themes/ananke
echo "theme = 'ananke'" >> config.toml
hugo server

Open up a browser to http://localhost:1313/ and you you’ll see the default ananke-themed site.

Next Steps

The ananke theme you just deployed is nice, but a much better theme is Docsy. Go give that a try.

–> Deploy Docsy on Hugo

1.5.2.1.1.2 - Docsy Install

Docsy is a good-looking Hugo theme that provides a landing page, blog, and a documentation sub-sites using bootstrap CSS.

The documentation site in particular let’s you turn a directory of text files into a documentation tree with relative ease. It even has a collapsible left nav bar. That is harder to find than you’d think.

Preparation

Docsy requires Hugo. Install that if you haven’t already. It also needs a few other things; postcss, postcss-cli, and autoprefixer from the Node.JS ecosystem. These should be installed in the project directory as version requirements change per theme.

mkdir some.site.org
cd some.site.org
sudo apt install -t testing nodejs npm
npm install -D autoprefixer 
npm install -D postcss
npm install -D postcss-cli

Installation

Deploy Docsy as a Hugo module and pull in the example site so we have a skeleton to work with. We’re using git, but we’ll keep it local for now.

git clone https://github.com/google/docsy-example.git .
hugo server

Browse to http://localhost:1313 and you should see the demo “Goldydocs” site.

Now you can proceed to configure Docsy!

Updating

The Docsy theme gets regular updates. To incorporate those you only have to run this command. Do this now, actually, to get any theme updates the example site hasn’t incoporated yet.

cd /path/to/my-existing-site
hugo mod get -u github.com/google/docsy

Troubleshooting

hugo

Error: Error building site: POSTCSS: failed to transform “scss/main.css” (text/css)>: Error: Loading PostCSS Plugin failed: Cannot find module ‘autoprefixer’

And then when you try to install the missing module

The following packages have unmet dependencies: nodejs : Conflicts: npm npm : Depends: node-cacache but it is not going to be installed

You may have already have installed Node.JS. Skip trying to install it from the OS’s repo and see if npm works. Then proceed with postcss install and such.

1.5.2.1.1.3 - Docsy Config

Let’s change the basics of the site in the config.toml file. I put some quick sed commands here, but you can edit by hand as well. Of note is the Github integration. We prepoulate it here for future use, as it allows quick edits in your browser down the road.

SITE=some.site.org
GITHUBID=someUserID
sed -i "s/Goldydocs/$SITE/" config.toml
sed -i "s/The Docsy Authors/$SITE/" config.toml
sed -i "s/example.com/$SITE/" config.toml
sed -i "s/example.org/$SITE/" config.toml
sed -i "s/google\/docsy-example/$GITHUBID\/$SITE/" config.toml 
sed -i "s/USERNAME\/REPOSITORY/$GITHUBID\/$SITE/" config.toml 
sed -i "s/https:\/\/policies.google.com//" config.toml
sed -i "s/https:\/\/github.com\/google\/docsy/https:\/\/github.com\/$GITHUBID/" config.toml
sed -i "s/github_branch/#github_branch/" config.toml

If you don’t plan to translate your site into different languages, you can dispense with some of the extra languages as well.

# Delete the 20 or so lines starting at "lLanguage] and stopping at the "[markup]" section,
# including the english section.
vi config.tml

# Delete the folders from 'content/' as well, leaving 'en'
rm -rf content/fa content/no

You should also set a default meta description or the engine will put in in the bootstrap default and google will summarize all your pages with that

vi config.toml

[params]
copyright = "some.site.org"
privacy_policy = "/privacy"
description = "My personal website to document what I know and how I did it"

Keep and eye on the site in your browser as you make changes. When you’re ready to start with the main part of adding content, take a look at the next section.

Docsy Operation

Notes

You can’t dispense with the en folder yet, as it breaks some github linking functionality you may want to take advantage of later

1.5.2.1.1.4 - Docsy Operate

This is a quick excerpt from the Docsy Content and Customization docs. Definitely spend time with those after reading the overview here.

Directory Layout

Content is, appropriately enough, in the content directory, and it’s subdirectories line up with the top-level navigation bar of the web site. About, Documentation, etc corresponds to content/about, content/docs and so on.

The directories and files you create will be the URL that you get with one important exception, filenames are converted to a ‘slug’, mimicking how index files work. For example, If you create the file docs/tech/mastadon.md the URL will be /docs/tech/mastadon/. This is for SEO (Search Engine Optimization).

The other thing you’ll see are _index.html files. In the example above, the URL /docs/tech/ has no content, as it’s a folder. But you can add a _index.md or .html to give it some. Avoid creating index.md or tech.md (a file that matches the name of a subdirectory). Either of those will block Hugo from generating content for any subdirectories.

The Landing Page and Top Nav Pages

The landing page itself is the content/_index.html file and the background is featured-background.jpg. The other top-nav pages are in the content folders with _index files. You may notice the special header variable “menu: main: weight: " and that is what flags that specific page as worth of being in the top menu. Removing that, or adding that (and a linkTitle) will change the top nav.

One of the most important features of the Docsy template is the well designed documentation section that features a Section menu, or left nav bar. This menu is built automatically from the files you put in the docs folder, as long as you give them a title. (See Front Matter, below). They are ordered by date but you can add a weight to change that.

It doesn’t collapse by default and if you have a lot of files, you’ll want to enable that.

# Search and set in your config.toml
sidebar_menu_compact = true

Front Matter

The example files have a section at the top like this. It’s not strictly required, but you must have at least the title or they won’t show up in the left nav tree.

---
title: "Examples"
---

Page Content and Short Codes

In addition to normal markdown or html, you’ll see frequent use of ‘shortcodes’ that do things that normal markdown can’t. These are built in to Hugo and can be added by themes, and look like this;

{{% blocks/lead color="dark" %}}
Some Important Text
{{% /blocks/lead %}}

Diagrams

Docsy supports mermaid and a few other tools for creating illustrations from code, such as KaTeX, Mermaid, Diagrams.net, PlantUML, and MarkMap. Simply use a codeblock.

```mermaid
graph LR
 one --> two
```

Generate the Website

Once you’re satisfied with what you’ve got, tell hugo to generate the static files and it will populate the folder we configured earlier

hugo

Publish the Web Site

Everything you need is in the public folder and all you need do is copy it to a web server. You can even use git, which I advise since we’re already using it to pull in and update the module.

–> Local Git Deployment

Bonus Points

If you have a large directory structure full of markdown files already, you can kick-start the process of adding frontmatter like this;

find . -type f | \
while read X
do
  TITLE=$(basename ${X%.*})
  FRONTMATTER=$(printf -- "---\ntitle = ${TITLE}\n---")
  sed -i "1s/^/$FRONTMATTER\n/" "$X"
done

1.5.2.1.1.5 - Docsy Github

You may have noticed the links on the right like “Edit this page” that takes one to Github. Let’s set those up.

On Github

Go to github and create a new repository. Use the name of your side for the repo name, such as “some.site.org”. If you want to use something else, you can edit your config.toml file to adjust.

Locally

You man have noticed that Github suggested some next steps with a remote add using the name “origin”. Docsy is already using that, however, from when you cloned it. So we’ll have to pick a new name.

cd /path/to/my-existing-site
git remote add github https://github.com/yourID/some.site.org

Let’s change our default banch to “main” to match Github’s defaults.

git branch -m main

Now we can add, commit and push it up to Github

git add --all
git commit -m "first commit of new site"
git push github

You’ll notice something interesting when you go back to look at Github; all the contributers on the right. That’s because you’re dealing with a clone of Docsy and you can still pull in updates and changes from original project.

It may have been better to clone it via github

1.5.2.2 - Content Deployment

Automating deployment as part of a general continuous integration strategy is best-practice these days. Web content should be similarly treated.

I.e. version controlled and deployed with git.

1.5.2.2.1 - Local Git Deployment

Overview

Let’s create a two-tiered system that goes from dev to prod using a post-commit trigger

graph LR
Development --git / rsync---> Production

The Development system is your workstation and the Production system is a web server you can rsync to. Git commit will trigger a build and rsync.

I use Hugo in this example, but any system that has an output (or build) folder works similarly.

Configuration

The first thing we need is a destination.

Production System

This server probably uses folders like /var/www/XXXXX for its web root. Use that or create a new folder and make yourself the owner.

sudo mkdir /var/www/some.site.org
sudo chown -R $USER /var/www/some.site.org
echo "Hello" > /var/www/some.site.org/index.html

Edit your web server’s config to make sure you can view that web page. Also check that rsync is available from the command line.

Development System

Hugo builds static html in a public directory. To generate the HTML, simply type hugo

cd /path/to/my-existing-site
hugo
ls public

We don’t actually want this folder in git and most themes (if you’re using Hugo) already have a .gitignore file. Take a look and create/add to it.

# Notice /public is at the top of the git ignore file
cat .gitignore

/public
package-lock.json
.hugo_build.lock
...

Assuming you have some content, let’s add and commit it.

git add --all
git commit -m "Initial Commit"

Note: All of these git commands work because pulling in a theme initialized the directory. If you’re doing something else you’ll need to git init.

The last step is to create a hook that will build and deploy after a commit.

cd /path/to/my-existing-site
touch .git/hooks/post-commit
chmod +x .git/hooks/post-commit
vi .git/hooks/post-commit

#!/bin/sh
hugo --cleanDestinationDir
rsync --recursive --delete public/ [email protected]:/var/www/some.site.org

This script ensures that the remote directory matches your local directory. When you’re ready to update the remote site:

git add --all
git commit --allow-empty -m "trigger update"

If you mess up the production files, you can just call the hook manually.

cd /path/to/my-existing-site
touch .git/hooks/post-commit

Troubleshooting

bash: line 1: rsync: command not found

Double check that the remote host has rsync.

1.5.2.3 - Content Delivery

1.5.2.3.1 - Cloudflare

Cloudflare acts as a reverse proxy to hide your server’s IP address
Takes over your DNS and directs requests to the closest site
Injects JavaScript analytics
- If the browser’s “do not track” is on, JS isn’t injected.
Can uses a tunnel and remove encryption overhead

1.5.3 - Servers

1.5.3.1 - Caddy

Caddy is a web server that runs SSL by default by automatically grabing a cert from Let’s Encrypt. It comes as a stand-alone binary, written in Go, and makes a decent reverse proxy.

1.5.3.1.1 - Installation

Installation

Caddy recommends “using our official package for your distro” and for debian flavors they include the basic instructions you’d expect.

Configuration

The easiest way to configure Caddy is by editing the Caddyfile

sudo vi /etc/caddy/Caddyfile
sudo systemctl reload caddy.service

Sites

You define websites with a block that includes a root and the file_server directive. Once you reload, and assuming you already have the DNS in place, Caddy will reach out to Let’s Encrypt, acquire a certificate, and automatically forward from port 80 to 443

site.your.org {        
    root * /var/www/site.your.org
    file_server
}

Authentication

You can add basic auth to a site by creating a hash and adding a directive to the site.

caddy hash-password

site.your.org {        
    root * /var/www/site.your.org
    file_server
    basic_auth { 
        allen SomeBigLongStringFromTheCaddyHashPasswordCommand
    }
}

Reverse Proxy

Caddy also makes a decent reverse proxy.

site.your.org {        
    reverse_proxy * http://some.server.lan:8080
}

You can also take advantage of path-based reverse proxy. Note the rewrite to accommodate the trailing-slash potentially missing.

site.your.org {
    rewrite /audiobooks /audiobooks/
    handle_path /audiobooks/* {
        uri strip_prefix /audiobooks/
        reverse_proxy * http://some.server.lan:8080
    }
}

Import

You can define common elements at the top (snippets) or in files and import them multiple times to save duplication. This helps when you have many sites.

# At the top in the global section of your Caddyfile
(logging) {
    log {
        output file /var/log/caddy/access.log
    }
}
site.your.org {
    import logging     
    reverse_proxy * http://some.server.lan:8080
}

Modules

Caddy is a single binary so when adding a new module (aka feature) you are essentially downloading a new version that has them compiled in. You can find the list of packages at their download page.

Do this at the command line with caddy itself.

sudo caddy add-package github.com/mholt/caddy-webdav
systemctl restart caddy

Security

Drop Unknown Domains

Caddy will accept connections to port 80, announce that it’s a Caddy web server and redirect you to https before realizing it doesn’t have a site or cert for you. Configure this directive at the bottom so it drops immediately.

http:// {
    abort
}

Crowdsec

Caddy runs as it’s own user and is fairly memory-safe. But installing Crowdsec helps identify some types of intrusion attempts.

CrowdSec Installation

Troubleshooting

You can test your config file and look at the logs like so

caddy validate --config /etc/caddy/Caddyfile
journalctl --no-pager -u caddy

1.5.3.1.2 - Logging

Access Logs

In general, you should create a snippet and import into each block.

#
# Global Options Block
#
{
    ...
    ...
}
#
# Importable Snippets
#
(logging) {
    log {
        output file /var/log/caddy/access.log
    }
}
#
# Host Blocks
#
site.your.org {
    import logging     
    reverse_proxy * http://some.server.lan:8080
}
other.your.org {
    import logging     
    reverse_proxy * http://other.server.lan:8080
}

Per VHost Files

If you want separate logs for separate vhosts, add a parameter to the import that changes the output file name.

#
# Global Options Block
#
{
    ...
    ...
}
#
# Importable Snippets
#
(logging) {
    log {
        # args[0] is appended at run time
        output file /var/log/caddy/access-{args[0]}.log   
    }
}
#
# Host Blocks
#
site.your.org {
    import logging site.your.org    
    reverse_proxy * http://some.server.lan:8080
}
other.your.org {
    import logging other.your.org    
    reverse_proxy * http://other.server.lan:8080
}

Wildcard Sites

Wildcard sites only have one block so you must use the hostname directive to separate vhost logs. This both sends a vhost to the file you want, and filters them out of others. You can also use an import argument as shown in this caddy example to save space. (I would never have deduced this on my own.)

#
# Global Options Block
#
{
    ...
    ...
}
#
# Importable Snippets
#
(logging) {
        log {
                output file /var/log/caddy/access.log {
                        roll_size 5MiB
                        roll_keep 5
                }
        }
}

(domain-logging) {
        log {
                hostnames {args[0]}
                output file /var/log/caddy/{args[0]}.log {
                        roll_size 5MiB
                        roll_keep 5
                }
        }
}
#
# Main Block
#
*.site.org, site.org {
        # Everything goes to this file unless it's filtered out by another log block
        import logging 

        @site host some.site.org
        handle @site {
                reverse_proxy * http://internal.site
        }

        # the www site will write to just this log file
        import domain-logging www.site.org 
        @www host www.site.org                
        handle @www { 
                root * /var/www/www.site.org 
                file_server 
        }

        # This site will write to the normal file
        @site host other.site.org
        handle @site {
                reverse_proxy * http://other.site
        }

Logging Credentials

If you want to track users, add a directive to the global headers.

# 
# Global Options Block 
# 
{
        servers { 
                log_credentials 
        }
}

File Permissions

By default, only caddy can read the log files. This is a problem when you have a log analysis package. In recent versions of caddy however, you can set the mode.

    log {
        output file /var/log/caddy/access.log {
                mode 644
        }
    }

If the log file doesn’t change modes, check the version of caddy. It must be newer than v2.8.4 for the change.

Troubleshooting

You can have a case where the domain specific file never gets created. This usually happens when there us nothing to write to it. Check the hostname is correct.

1.5.3.1.3 - WebDAV

Caddy can also serve WebDAV requests with the appropriate module. This is important because for many clients, such as Kodi, WebDAV is significantly faster.

sudo caddy add-package github.com/mholt/caddy-webdav
sudo systemctl restart caddy

{   # Custom modules require order of precedence be defined
    order webdav last
}
site.your.org {
    root * /var/www/site.your.org
    webdav * 
}

You can combine WebDAV and Directly Listing - highly recommended - so you can browse the directory contents with a normal web browser as well. Since WebDAV doesn’t use the GET method, you can use the @get filter to route those to the file_server module so it can serve up indexes via the browse argument.

site.your.org {
    @get method GET
    root * /var/www/site.your.org
    webdav *
    file_server @get browse        
}

Sources

https://github.com/mholt/caddy-webdav https://marko.euptera.com/posts/caddy-webdav.html

1.5.3.1.4 - MFA

The package caddy-security offers a suite of auth functions. Among these is MFA and a portal for end-user management of tokens.

Installation

# Install a version of caddy with the security module 
sudo caddy add-package github.com/greenpau/caddy-security
sudo systemctl restart caddy

Configuration

/var/lib/caddy/.local/caddy/users.json

caddy hash-password

Troubleshooting

journalctl –no-pager -u caddy

1.5.3.1.5 - Wildcard DNS

Caddy has an individual cert for every virtual host you create. This is fine, but Let’s Encrypt publishes these as part of certificate transparency and the bad guys are watching. If you create a new site in caddy, you’ll see bots probing for weaknesses within 30 min - without you even having published the URL. There’s no security in anonymity, but the need-to-know principle suggests we shouldn’t be informing the whole world about sites of limited scope.

One solution is a wildcard cert. It’s published as just ‘*.some.org’ so there’s no information disclosed. Caddy supports this, but it requires a little extra work.

Installation

In this example we are already using the default Caddy binary but want to connect to CloudFlare’s DNS service. We must change to a custom Caddy binary for that. Check https://github.com/caddy-dns to see if your DNS provider is available.

# Divert the default binary from the repo
sudo dpkg-divert --divert /usr/bin/caddy.default --rename /usr/bin/caddy
sudo cp /usr/bin/caddy.default /usr/bin/caddy.custom
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.default 10
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.custom 50

# Add the package and restart. 
sudo caddy add-package github.com/caddy-dns/cloudflare
sudo systemctl restart caddy.service

Warning:

Because we’ve diverted, apt update will not update caddy. This also stops unattended-updates. You must use caddy upgrade instead. The devs don’t think this should be an issue. I disagree. but you can add a cron job if you like.

DNS Provider Configuration

For Cloudflare, a decent example is below. Just use the ‘Getting the Cloudflare API Token’ part

https://roelofjanelsinga.com/articles/using-caddy-ssl-with-cloudflare/

Caddy Configuration

Use the acme-dns global option and then create a single site (used to determine the cert) and match the actual vhosts with subsites.

{
    acme_dns cloudflare alotcharactersandnumbershere
}

*.some.org, some.org {

    @site1 host site1.some.org
    handle @site1 {
        reverse_proxy * http://localhost:3200
    }

    @site2 host site2.some.org
    handle @site2 {
        root * /srv/www/site2
    }
}

2 - Networking

2.1 - Firewall

2.1.1 - Linux Firewall Router

Creating a Linux router is fairly simple. Some distros like Alpine Linux are well suited for it but any will do. I used Debian in this example.

Install the base OS without a desktop system. Assuming you have two network interfaces, pick one to be the LAN interface (traditionally the first one, eth0 or such) and set the address statically.

Basic Routing

To route, all you really need do is enable forwarding.

# as root

# enable
sysctl -w net.ipv4.ip_forward=1

# and persist
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

Private Range

If one side is a private network, such as in the 192.168, you probably need to masquerade. This assumes you already have nftables installed and it’s default rules in /etc/nftables.conf

# As root

# Add the firewall rules to masquerade
nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade

# Persist the rules and enable the firewall
nft list ruleset >> /etc/nftables.conf
systemctl enable --now  nftables.service

DNS and DHCP

If you want to provide network services such as DHCP and DNS, you can add dnsmasq

sudo apt install dnsmasq

Assuming the LAN interface is named eth0 and set to 192.168.0.1.

vi  /etc/dnsmasq.d/netboot.conf 

interface=eth0
dhcp-range=192.0.1.100,192.0.1.200,12h
dhcp-option=option:router,192.168.0.1
dhcp-authoritative

systemctl enable --now  nftables.service

Firewall

You may want to add some firewall rules too.

# allow SSH from the lan interface
sudo nft add rule inet filter input iifname eth0 tcp dport ssh accept

# allow DNS and DHCP from the lan interface
sudo nft add rule inet filter input iifname eth0 tcp dport domain accept
sudo nft add rule inet filter input iifname eth0 udp dport {domain, bootps} 

# Change the default input policy to drop 
sudo nft add chain inet filter input {type filter hook input priority 0\; policy drop\;}

You can fine-tune these a bit more with the nft example.

2.1.2 - OpenWRT

I once read a comparison of router distros (that I can no longer find) and was surprised to see OpenWRT score near the top in terms of speed and reliability. I’d thought it just a solution for small appliances, but I read that the developers spent significant time optimizing it to be a worthy alternative for large scale routing and firewall. And you can run in a container as opposed to needing a full VM like OPNsense.

2.1.2.1 - GeoIP

OpenWRT doesn’t ship with GeoIP capabilities, but you can add it with the IP set extras script. This is a somewhat legacy approach, but the GUI requires it, and OpenWRT translates it modern nft named sets under the hood.

# At the command line
opkg update
opkg install ipset resolveip


# Install ipset-extras via their install script
wget -U "" -O ipset-extras.sh "https://openwrt.org/_export/code/docs/guide-user/advanced/ipset_extras?codeblock=0"
chmod +x ipset-extras.sh
./ipset-extras.sh

# Logout and back in to enable the extension from the /etc/profile.d folder
exit

# Configure an IP set for the US
uci set dhcp.us="ipset"
uci add_list dhcp.us.name="US"
uci add_list dhcp.us.name="US6"
uci add_list dhcp.us.geoip="us"
uci commit dhcp

# Populate IP sets
ipset setup

# Check set creation worked
nft list sets

# Install hotplug-extras for persistence
wget -U "" -O hotplug-extras.sh "https://openwrt.org/_export/code/docs/guide-user/advanced/hotplug_extras?codeblock=0"
chmod +x ./hotplug-extras.sh
./hotplug-extras.sh

When adding a port forward, the Advanced tab will now have the “Use ipset” populated and you can select “US”

You can also invert the rule by typing in “! US” - an important feature that doesn’t jump out at you.

This inversion is best with with a list of who you don’t want. Here’s an example of a set for the worst 5 countries for probes.

# Configure an IP set for the worst countries for probes and hacks - the axis of hacks
uci set dhcp.axis="ipset"
uci add_list dhcp.axis.name="axis"
uci add_list dhcp.axis.name="axis6"
uci add_list dhcp.axis.geoip="cn"
uci add_list dhcp.axis.geoip="in"
uci add_list dhcp.axis.geoip="ne"
uci add_list dhcp.axis.geoip="ng"
uci add_list dhcp.axis.geoip="ru"
uci commit dhcp

ipset setup

Troubleshooting

No NFT Sets Generated:

The script that generates the sets is sensitive to the value you put in `dhcp.axis.name=“value”. Try avoiding spaces and numbers and that ‘6’ is at the end of the second value.

Why are we adding these to DHCP:

I don’t know. The documentation adds them there. In the code, the script loads them from there. I tried adding them to the Firewall section just for fun and it didn’t work. I suspect if I knew more it would make sense.

Notes

The installation uses <www.ipdeny.com> and adds a cronjob that updates the list daily at 3 AM.

There are other tools, such as geopip-shell or ban ip, but these are more of an all-or-nothing solution and can’t be used with individual firewall rules. There is also the python utility from the netfilter team and misc bash scripts, but these lack easy OpenWRT integration.

The authors of nftables have a page[^5] on GeoIP, but it’s about tagging packets. OpenWRT expects named sets and you can’t easily construct that from a map.

[^5] https://wiki.nftables.org/wiki-nftables/index.php/GeoIP_matching

2.1.2.2 - OpenWRT in PVE LXC

When running lots of guests it helps to but them behind a virtual router. If you’re keeping things lean by using LXC containers you can put your router in a container too with OpenWRT.

The process in PVE is to:

Prepare Networking
Download OpenWRT
Create The Container
Edit The FW Init

Prepare Networking

You’re going to create a LAN inside of Proxmox and you can do it a couple of different ways;

Overlay
Additional Interface
VLAN

Overlay

The simplest thing to do is nothing. You just manually assign IPs and a gateway in a different range than your existing router and have two networks operating on the same physical LAN. The main downside is you can’t take advantage of DHCP because it would conflict with the original LAN.

Additional Interface

You can also install a second network card. This of course has a cost, though if you only have one PVE host you can cheat by just creating a new bridge interface that goes nowhere. But this isn’t helpful in a cluster.

VLAN

The best way is to add a Virtual LAN. Simply edit the config for vmbr0 and enable the VLAN aware checkbox. Then add an interface to the container and specify a VLAN Tag, such as “2”. Most network equipment is happy to pass it along to other cluster members so it just works.

Download OpenWRT

You want just the root file system, not the full image that includes the kernel. Happily, OpenWRT makes this available. Navigate to their releases, find the most recent, and drill down to targets / x86 / 64 / rootfs.tar.gz. It will save along the lines of “openwrt-24.10.1-x86-64-rootfs.tar.gz”.

Next, upload it to PVE with a secure copy to the root home folder like scp openwrt* root@pve01:

Create The Container

What we uploaded earlier isn’t actually a template, but it’s close enough as along as we create the container at PVE’s command line¹. The key here is that we provide an archive and set the OS type to unmanaged.

pct create \
 201 \
 ./openwrt* \
 --rootfs local-lvm:0.4 \
 --ostype unmanaged \
 --hostname openwrt \
 --arch amd64 \
 --cores 2 \
 --memory 256 \
 --swap 0 \
 --features nesting=1 \
 --net0 name=eth0,bridge=vmbr0,tag=2 \
 --net1 name=eth1,bridge=vmbr0

Also of note, we enable nesting so that dnsmasq will start² and set the VLAN tag on eth0 which comes up as the LAN interface by default on this image of OpenWRT. The container’s disk uses the rootfs syntax of STORAGE_ID:SIZE_IN_GiB, here being .4 Gigs.

Add Clients and Rules

When creating guests, make sure to change their network settings in PVE to have a VLAN tag of ‘2’ (or whatever you’re using).

In OpenWRT, add rules Network -> Firewall -> Port Forwards. There are no WAN rules discrete from port forwarding.

Updates

You should update by downloading new firmware, not by using the package manger. In fact: “Generally speaking, the use of opkg upgrade is very highly discouraged. It should be avoided in almost all circumstances³.”

But if you must;

opkg update
opkg list-upgradable | cut -f 1 -d ' ' | xargs opkg upgrade

A default install of PVE creates a single Linux Bridge, usually named vmbr0. Think of this as a virtual switch. The management interface is on that bridge, as well as any containers or guests. Most things just need one interface, but OpenWRT expects two. It is a router, after all.

In most cases, adding a VLAN is best, but there are other options. You can see and make changes in the Proxmox web GUI by changing to Server View, selecting a ProxMox Host, then going to System -> Network.

create a new bridge. Select new and allow it to select the name (which should be vmbr1). Leave the rest at the defaults (all blank with autostart checked). Important If you have a cluster you must actually connect this new bridge to a network adapter in the “Bridge ports” setting. Otherwise, it won’t be able to talk beyond the host it’s currently on. If you don’t have a second NIC, then this probably won’t do what you want.

2.1.3 - OPNsense

10G Speeds

When you set an OPNsense system up with supported 10G cards, say the Intel X540-AT2, you can move 6 to 8 Gb a second. Though this is better than in the past, but not line speed.

# iperf between two systems routed through a dial NIC on OPNsense

[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0040 sec  8.04 GBytes  6.90 Gbits/sec

This is because the packet filter is getting involved. If you disable that you’ll get closer to line speeds

Firewall –> Settings –> Advanced –> Disable Firewall

[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0067 sec  11.0 GBytes  9.40 Gbits/sec

2.2 - Internet

2.2.1 - DNS

Web pages today are complex. Your browser will make on average 40¹ DNS queries to find the various parts of an average web page, so implementing a local DNS system is key to keeping things fast.

In general, you can implement either a caching or recursive server with the choice between speed vs privacy.

Types of DNS Servers

A caching server accepts and caches queries, but doesn’t actually do the lookup itself. It forwards the request on to another DNS server and waits for the answer. If you have a lot of clients configured to use it, chances are someone else has already asked for what you want and it can supply the answer quickly from cache.

A recursive server does more than just cache answers. It knows how to connect to the root of the internet and find out itself. If you need to find some.test.com, it will connect to the .com server, ask where test.com is, then connect to test.com and ask it for some.

Comparison

Between the two, the caching server will generally be faster. If you connect to a large DNS service they will almost always have things cached. You will also get geographically relevant results as content providers work with DNS providers to direct you to the closest content cache.

With a recursive server, you do the lookup yourself and no single entity is able to monitor your DNS queries. You also aren’t dependant upon any upstream provider. But you make every lookup ’the long way’, and that can take many hundreds of milliseconds on some cases, a large part of a page load time.

Testing

In an ad hoc test on a live network with about 5,000 residential user devices, about half the queries were cached. The other half were sent to either quad 9 or a local resolver. Quad 9 took about half the time that the local resolver did.

Here are the numbers - with Steve Gibson’s DNS benchmarker against pi-hole forwarding to a local resolver vs pi-hole forwarding to quad 9. Cached results excluded.

    Forwarder     |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  - Uncached Name | 0.015 | 0.045 | 0.214 | 0.046 | 100.0 |
  - DotCom Lookup | 0.015 | 0.019 | 0.034 | 0.005 | 100.0 |
  ---<O-OO---->---+-------+-------+-------+-------+-------+

    Resolver      |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  - Uncached Name | 0.016 | 0.078 | 0.268 | 0.079 | 100.0 |
  - DotCom Lookup | 0.018 | 0.035 | 0.078 | 0.017 | 100.0 |
  ---<O-OO---->---+-------+-------+-------+-------+-------+

Selection

This test is interesting, but not definitive. While the DNS benchmark shows that the uncached average is better, page load perception is different than the sum of DNS queries. A page metric test would be good, but in general, faster is better.

Use a caching server.

One last point: use your ISP’s name server when possible. They will direct you to their local content caching systems for Netflix, Google (YouTube) and Akamai. If you use quad 9 like I did, you may get to a regional content location, but you miss out on things optimized specifically for your geographic location.

They are (probably) capturing all your queries for monetization, and possibly directing you to their own their own advertising server when you mis-key in a domain name. So you’ll need to decide;

Speed vs privacy.

Informal personal checking of random popular sites. ↩︎

2.2.1.1 - Pi-hole

Pi-hole is reasonable choice for DNS service, especially if you want metrics and reporting. A single instance will scale to 1000 active clients with just 1 core and 500M RAM and do a good job showing what’s going on.

Performance is limited with version 5, however. When you get past 1000 active clients you can take some mitigating steps, but the main process is single-threaded so you’re unlikely to get past 1500.

But for smaller deployments, it’s hard to beat.

Preparation

Prepare and secure a Debian system

Set a Static Address

sudo vi /etc/network/interfaces

Change

# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp

auto eth0
iface eth0 inet static
    address 192.168.0.2/24
    gateway 192.168.0.1

Secure Access with Nftables

Nftables is the modern way to set netfilter (localhost firewall) rules.

sudo apt install nftables
sudo systemctl enable nftables
sudo vi /etc/nftables.conf

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0;

                # accept any localhost traffic
                iif lo accept

                # accept already allowed and related traffic
                ct state established,related accept

                # accept DNS and DHCP traffic from internal only
                define RFC1918 = { 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12 }
                udp dport { domain, bootps } ip saddr $RFC1918 ct state new accept
                tcp dport { domain, bootps } ip saddr $RFC1918 ct state new accept

                # accept web and ssh traffic on the first interface or from an addr range
                iifname eth0 tcp dport { ssh, http } ct state new accept
                 # or 
                ip saddr 192.168.0.1/24 ct state new accept

                # Accept pings
                icmp type { echo-request } ct state new accept

                # accept neighbor discovery otherwise IPv6 connectivity breaks.
                ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit,  nd-router-advert, nd-neighbor-advert } accept

                # count other traffic that does match the above that's dropped
                counter drop
        }
}

sudo nft -f /etc/nftables.conf
sudo systemctl start nftables.service

Add Unattended Updates

This an optional, but useful service.

apt install unattended-upgrades

sudo sed -i 's/\/\/\(.*origin=Debian.*\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies\) "false";/  \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Automatic-Reboot\) "false";/  \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades

Installation

sudo apt install curl
curl -sSL https://install.pi-hole.net | bash

Configuration

Upstream Provider

Pi-hole is a DNS Forwarder, meaning it asks someone else on your behalf and caches the results for other people when they ask too.

Assuming the installation messages indicate success, the only thing you need do is pick that upstream DNS provider. In many cases your ISP is the fastest, but you may want to run a [DNS benchmark] with those added to find the best.

Settings -> DNS -> Upstream DNS Servers

Also, if you have more than one network, you may need to allow the other LANs to connect.

Interface settings -> Permit all origins (needed if you have multiple networks)

Block Lists

One of the main features of Pi-hole is that it blocks adds. But you may need to disable this for google or facebook search results to work as expected. The top search results are often ads and don’t work as expected when pi-hole is blocking them.

Admin Panel -> Ad Lists -> Status Column

You might consider adding security only lists instead, such as Intel’s below

https://osint.digitalside.it/Threat-Intel/lists/latestdomains.txt

Search the web for other examples.

Operation

DNS Cache Size

When Pi-hole looks up a name, it get’s back an IP address and what’s called a TTL (Time To Live). The latter tells Pi-hole how long it should keep the results in cache for.

If you’re asking for names faster than they are expiring, Pi-hole will ’evict’ things from cache before they should be. You can check this at:.

settings -> System -> DNS cache evictions:

You’ll notice that insertions keep climbing as things are added to the cache, but the cache number itself represents only those entries that are current. If you do see evictions, edit CACHE_SIZE in /etc/pihole/setupVars.conf

You can also check this at the command line

dig +short chaos txt evictions.bind @localhost

   dig +short chaos txt cachesize.bind
   dig +short chaos txt hits.bind
   dig +short chaos txt misses.bind

Having more than you need is a waste, however, when it could be used for disk buffers, etc. So don’t add more unless it’s needed.

Use Stale Cache

If you have the spare memory, you can tell Pi-hole not to throw away potentially-useful data with the use-stale-cache flag. It will instead wait for someone to ask for it, serve it up immediately even though it’s expired, then go out and check to see if anything has changed. So TTLs are ignored but unused entries are removed after 24 hours to keep the cache tidy.

This increases performance at the expense of more memory use.

# Add this to pi-hole's dnsmasq settings file
echo "use-stale-cache" >> /etc/dnsmasq.d/01-pihole.conf

Local DNS Entries

You can enter local DNS and CNAME entries via the GUI, (Admin Panel -> Local DNS), but you can also edit the config file for bulk entries.

For A records

vim /etc/pihole/custom.list

10.50.85.2 test.some.lan
10.50.85.3 test2.some.lan

For CNAME records

vim /etc/dnsmasq.d/05-pihole-custom-cname.conf

cname=test3.some.lan,test.some.lan

Upgrading

Pi-hole wasn’t installed via a repo, so it won’t be upgraded via the Unattended Upgrades service. It requires a manual command.

sudo pihole -up

Troubleshooting

Dashboard Hangs

Very busy pi-hole installations generate lots of data and (seemingly) hang the dashboard. If that happens, limit the about of data being displayed.

vi /etc/pihole/pihole-FTL.conf

# Don't import the existing DB into the GUI - it will hang the web page for a long time
DBIMPORT=no

# Don't import more than an hour of logs from the logfile
MAXLOGAGE=1

# Truncate data older than this many days to keep the size of the database down
MAXDBDAYS=1

sudo systemctl restart pihole-FTL.service

Rate Limiting

The system has a default limit of 1000 queries in a 60 seconds window for each client. If your clients are proxied or relayed, you can run into this. This event is displayed in the dashbaord¹ and also in the logs².

sudo grep -i Rate-limiting /var/log/pihole/pihole.log /var/log/pihole/pihole.log

You may find the address 127.0.0.1 being rate limited. This can be due to pi-hole doing a reverse of all client IPs every hour. You can disable this with:

# In the pihole-FTL.conf
REFRESH_HOSTNAMES=NONE

DNS over HTTP

Firefox, if the user has not yet chosen a setting, will query use-application-dns.net. Pi-hole respods with NXDOMAIN³ as a signal to use pi-hole for DNS.

/etc/pihole/pihole-FTL.conf

Apple devices offer a privacy enhancing service⁴ that you can pay for. Those devices queries mask.icloud.com as a test and Pi-hole blocks that by default. The user will be notified you are blocking it.

# Signal that Apple iCloud Private Relay is allowed 
BLOCK_ICLOUD_PR=false

sudo systemctl reload pihole-FTL.service

Searching The Query Log Hangs DNS

On a very busy server, clicking show-all in the query log panel will hang the server as pihole-FTL works through it’s database. There is no solution, just don’t do it. The best alternative is to ship logs to a Elasticsearch or similar system.

Ask Yourself

The system continues to use whatever DNS resolver was initially configured. You may want it to use itself, instead.

# revert if pi-hole itself needs fixed.
sudo vi /etc/resolv.conf

nameserver 127.0.0.1

Other Settings

Pi-hole can be configured via it’s two main config files, /etc/pihole/setupVars.conf and pihole-FTL.conf, and you may find other useful settings.

In the v6, currently in beta, the settings are all in the .toml file, or via the gui via:

All Settings — Resolver – resolver.resolveIPv4 – resolver.resolveIPv6 – resolver.networkNames

2.2.1.2 - Pi-hole Unbound

Pi-hole by itself is just a DNS forwarding and caching service. It’s job is to consolidate requests and forward them on to someone else. That someone else is seeing all your requests.

If that concerns you, add your own DNS Resolver like Unbound. It knows how to fetch answers without any one entity seeing all your requests. It’s probably slower¹, but possibly more secure.

Installation

sudo apt install unbound

Configuration

Unbound

The pi-hole guide for [unbound]:(https://docs.pi-hole.net/guides/dns/unbound/) includes a config block to copy and paste as directed. You should also add a config file for dnsmasq while you’re at it, to set EDNS packet sizes. (dnsmasq comes as part of pi-hole)

sudo vi /etc/dnsmasq.d/99-edns.conf

edns-packet-max=1232

When you check the status of unbound, you can ignore the warning: subnetcache:... as it’s just reminding you that data in the subnet cache (if you were to use it) can’t pre-fetched. There’s some conversation² as to why it’s warning us.

The config includes prefetch, but you may also wish to add serve-expired to it, if you’re not already using use-stale-cache in Pi-hole.

# serve old responses from cache while waiting for the actual resolution to finish.
# don't set this if you're already doing it in Pi-hole
serve-expired: yes

sudo systemctl restart unbound.service

No additional setup is needed, but see the unbound page for more info.

Pi-hole

You must tel Pi-hole about the Resolver you’ve just deployed.

Settings -> DNS -> Upstream DNS Servers -> Custom 1 (Check and add 127.0.0.1#5335 as shown in the unbound guide linked above)

2.2.1.3 - Pi-hole DHCP

Pi-hole can also do DHCP. However, the GUI only allows for a single range. On a large network you’ll need multiple ranges. You do this by editing the config files directly.

Interface-Based Ranges

In this setup, you have a separate interface per LAN. Easy to do in a virtual or VLAN environment, but you’ll have to define each in the /etc/network/interfaces file.

Let’s create a range from 192.168.0.100-200 tied to eth0 and a range of 192.168.1.100-200 tied to eth1. We’ll also specify the router and two DNS servers.

vim /etc/dnsmasq.d/05-dhcp.conf

dhcp-range=eth0,192.168.0.100,192.168.0.200,24h
dhcp-option=ens161,option:router,192.168.0.1
dhcp-option=ens161,option:dns-server,192.168.0.2,192.168.0.3

dhcp-range=eth1,192.168.1.100,192.168.1.200,24h
dhcp-option=ens161,option:router,192.168.1.1
dhcp-option=ens161,option:dns-server,192.168.1.2,192.168.1.3

# Shared by both
dhcp-option=option:netmask,255.255.0.0

# Respond immediately without waiting for other servers 
dhcp-authoritative

# Don't try and ping the address before assigning it
no-ping

dhcp-lease-max=10000
dhcp-leasefile=/etc/pihole/dhcp.leases

domain=home.lan

These settings can be implicit - i.e. we could have left out ethX in the range, but explicit is often better for clarity.

Note - the DHCP server (dnsmasq) is not enabled by default. You can do that in the GUI under Settings –> DHCP

Relay-Based Ranges

In this setup, the router relays DHCP requests to the server. Only one system network interface is required, though you must configure the router(s).

When configured, the relay (router) sets the relay-agent (giaddr) field, sends it to dnsmasq, which (I think) understands it’s a relayed request when it sees that field, and looks at it’s available ranges for a match. It’s also sets a tag that to be used for assigning different options, such as the gateway, per range.

dhcp-range=tag0,192.168.0.100,192.168.0.250,255.255.255.0,8h
dhcp-range=tag1,192.168.1.100,192.168.1.250,255.255.255.0,8h
dhcp-range=tag2,192.168.2.100,192.168.2.250,255.255.255.0,8h

dhcp-options=tag0,3,192.168.0.1
dhcp-options=tag1,3,192.168.1.1
dhcp-options=tag2,3,192.168.2.1

Sources

https://discourse.pi-hole.net/t/more-than-one-conditional-forwarding-entry-in-the-gui/11359

Troubleshooting

It’s possible that the DHCP part of dnsmasq doesn’t scale to many thousands of leases¹

https://news.ycombinator.com/item?id=33895187 ↩︎

2.3 - Routing

2.3.1 - VyOS

VyOS gets a lot of respect as a network appliance. It’s a Debian-based router/firewall descended from the Vyatta project and has a command line config similar to JUNOS. It scores well in speed and reliability tests, a free version is available and commercial support is easy to get.

They do steer you toward the rather expensive commercial option by limiting access to the LTS versions, but you can always download the rolling release, the beta, or build it from source with a fairly straight-forward docker build process.

Downloading the beta, or stream as they call it, can be done with:

wget https://community-downloads.vyos.dev/stream/1.5-stream-2025-Q1/vyos-1.5-stream-2025-Q1-generic-amd64.iso

And a creation something like

qm create 200 \
--name vyos \
--memory 2048 \
--net0 virtio,bridge=vmbr0 \
--net1 virtio,bridge=vmbr0,tag=2 \
--ide2 media=cdrom,file=local:iso/live-image-amd64.hybrid.iso \
--virtio0 local-lvm:15

Then it’s just a matter of booting from the iso and running the very simple [install] and [quick-start] process. Assuming you’re going with the normal setup, hit the console and enter something like this.

Step 1 - Configure the interfaces and enable remote access.

# Enter 'configure' mode
configure

# For the address space 192.168.1.0/24

# Configure the LAN and WAN ports, with eth0 being the WAN
set interfaces ethernet eth0 address dhcp
set interfaces ethernet eth0 description 'OUTSIDE'
set interfaces ethernet eth1 address '192.168.1.1/24'
set interfaces ethernet eth1 description 'LAN'

# Enable remote login
# set service ssh listen-address '192.168.1.1' # Possibly don't listen on WAN if you don't need it
set service ssh port '22'

# Commit the changes and save if they work
commit
save

Step 2 - configure DNS/DHCP. There’s a lot of text, so we usually to SSH in to continue.

# Configure LAN DHCP services
ssh [email protected]
configure
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 option default-router '192.168.1.1'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 option name-server '192.168.1.1'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 option domain-name 'vyos.net'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 lease '86400'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 range 0 start '192.168.1.9'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 range 0 stop '192.168.1.254'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 subnet-id '1'

# Configure DNS services
set service dns forwarding cache-size '0'
set service dns forwarding listen-address '192.168.1.1'
set service dns forwarding allow-from '192.168.1.0/24'

commit
save

Step 3 - enable NAT

# Enable masquerade - assuming you need it.
set nat source rule 100 outbound-interface name 'eth0'
set nat source rule 100 source address '192.168.1.0/24'
set nat source rule 100 translation address masquerade

commit
save

Step 4 - Set Names and Enable basic firewall rules

# Use the 'group' feature to give the interfaces more readable names than ethX
set firewall group interface-group WAN interface eth0
set firewall group interface-group LAN interface eth1
set firewall group network-group NET-INSIDE-v4 network '192.168.1.0/24'

# Typically you want a default drop as a global rule.
set firewall global-options state-policy established action accept
set firewall global-options state-policy related action accept
set firewall global-options state-policy invalid action drop






























[install]:https://docs.vyos.io/en/latest/installation/install.html#permanent-installation
[quick-start]:https://docs.vyos.io/en/latest/quick-start.html

2.4 - VPN

2.4.1 - Wireguard

Wireguard is the best VPN choice for most situations. It’s faster and simpler than its predecessors and what you should be using on the internet.

Key Concepts

Wireguard is works at the IP level and is designed for the Internet/WAN. It doesn’t include DHCP, bridging, or other low-level features
Participants authenticate using public-key cryptography, use UDP as a transport and do not respond to unauthenticated connection attempts.
Peer to Peer by default.

By the last point, we mean there is no central authority required. Each peer defines their own IP address, routing rules, and decides from whom they will accept traffic. Every peer must exchange public keys with every other other peer. Traffic is sent directly between configured peers. You can create that design, but it’s not baked-in.

Design

The way you deploy depends on what you’re doing, but in general you’ll either connect directly point-to-point or create a central server for remote access or management.

Hub and Spoke
Point to Point

Hub and Spoke

This is the classic setup where clients initiate a connection. Configure a wireguard server and tell your clients about it. This is also useful for remote management when devices are behind NAT. Perfrom the steps in:

And then choose based on if your goal is to:

Provide Remote Access - i.e. allow clients to access to your central network and/or the Internet.

Provide Remote Management - i.e. allow the server (or an admin console) to connect to the clients.

Point to Point

You can also have peers talk directly to each other. This is often used with routers to connect networks across the internet.

Point to Point

2.4.1.1 - Server

A central server gives remote devices a reachable target, allowing them to traverse firewalls and NAT and connect. Let’s create a server and generate and configure a remote device.

Preparation

You’ll need:

Public Domain Name or Static IP
Linux Server and the ability to port-forward UDP 51820 to it
To choose a routing network IP block

A dynamic domain name will work and it’s reasonably priced (usually free). You just need something for the peers to connect to, though a static IP is best. You can possibly break connectivity if your IP changes while your peers are connected or have the old IP cached.

We use Debian in this example and derivatives should be similar. UDP 51820 is the standard port but you can choose another if desired.

You must also choose a VPN network that doesn’t overlap with your existing networks. We use 192.168.100.0/24 in this example. This is the internal network used inside the VPN to route traffic.

Installation

sudo apt install wireguard-tools

Configuration

The server needs just a single config file, and it will look something like this:

[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = sGp9lWqfBx+uOZO8V5NPUlHQ4pwbvebg8xnfOgR00Gw=

We choose 192.168.100.0/24 as our VPN internal network and picked .1 as our server address (pretty standard), created a private key with the wg tool, and put that in the file /etc/wireguard/wg0.conf. Here’s the commands to do that.

# As root
cd /etc/wireguard/
umask 077

wg genkey > server_privatekey
wg pubkey < server_privatekey > server_publickey

read PRIV < server_privatekey

# We create the file wg0.conf here
cat << EOF > wg0.conf
[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = $PRIV
EOF

Operation

The VPN operates by creating network interface and loading a kernel module. You can use the linux ip command to add a network interface of type wireguard (that automatically loads the kernel module) or use the wg-quick command do do it for you.

Test the Interface

# The tool looks for the wg0.conf file you created earlier
wg-quick up wg0

ping 192.168.100.1

wg-quick down wg0

Enable The Service

For normal use, employ systemctl to create a service using the installed service file.

systemctl enable --now wg-quick@wg0

That’s it - add remote clients/peers and they will be able to connect.

Client

Troubleshooting

When something is wrong, you don’t get an error message, you just get nothing. You bring up the client interface but you can’t ping the server. So turn on log messages on the server with this command.

echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg

# When done, send a '-p'

Key Errors

wg0: Invalid handshake initiation from 205.133.134.15:18595

In this case, you should check your keys and possibly take the server interface down and up.

Typeos

ifconfig: ioctl 0x8913 failed: No such device

Check your conf is named /etc/wireguard/wg0.conf and look for any mistakes. Replace from scratch if nothing else.

Firewall Issues

If you see no wireguard error messages, suspect your firewall. Since it’s UDP you can’t test the port directly, but you can use netcat.

# On the server
systemctl stop wg-quick@wg0
nc -ulp 51820  

# On the client.
nc -u some.server 51820  

# Type some text and it should be echoed on the server

2.4.1.2 - Client

In theory, the client is an autonomous entity with whom you negotiate IPs and exchange public keys. In practice, you’ll just create a conf file and distribute it.

Define a Client on The Server

Each participant must have a unique Key-Pair and IP address. You cannot reuse keys as they are hashed and used as for internal routing.

Generate a Key-Pair

# On the 'server'
cd /etc/wireguard
wg genkey > client_privatekey # Generates and saves the client private key
wg pubkey < client_privatekey # Displays the client's public key

Select an IP

Choose an IP for the client and add a block at the bottom of your server’s wg0.conf. It’s fine to just increment the IP as you add clients . Note the /32, meaning on traffic with that specific IP is accepted from this peer - it’s not a router on the other side, after all.

# Add this block to the bottom of your server's wg0.conf file

##  Some Client  ##
[Peer]
PublicKey = XXXXXX
AllowedIPs = 192.168.100.2/32

# Load the new config
wg-quick down wg0 &&  wg-quick up wg0

Create a Client Config File

This is the file that the client needs. It will look similar to this. The [Interface] is about the client and the [Peer] is about the server.

[Interface]
PrivateKey = THE-CLIENT-PRIVATE-KEY
Address = 192.168.100.2/32

[Peer]
PublicKey = YOUR-SERVERS-PUBLIC-KEY
AllowedIPs = 192.168.100.0/24
Endpoint = your.server.org:51820

Put in the keys and domain name, zip it up and send it on to your client as securely as possible. Maybe keep it around for when they loose it. One neat trick is to display a QR code right in the shell. Devices that have a camera can import from that.

qrencode -t ANSIUTF8 < client-wg0.conf

Test On The Client

On Linux

On the client side, install the tools and place the config file.

# Install the wireguard tools
sudo apt install wireguard-tools

# Copy the config file to the wireguard folder
sudo cp /home/you/client-wg0.conf /etc/wireguard/wg0.conf

sudo wg-quick up wg0
ping 192.168.100.1
sudo wg-quick down wg0

# Possibly enable this as a service or import as a network manager profile
systemctl enable --now wg-quick@wg0
## OR ##
# You may want to rename the file as that's used in nm as it's name
nmcli connection import type wireguard file client-wg0.conf
sudo nmcli connection modify client-wg0.conf autoconnect no

On Windows or Mac

You can download the client from here and add the config block

https://www.wireguard.com/install/

Test

You should be able to ping the server from the client and vice versa. If not, take a look at the troubleshooting steps in the Central Server page.

Next Steps

You’re connected to the server - but that’s it. You can’t do anything other than talk to it. The next step depends on if you want to:

Allow the client Remote Access to the server’s network and/or the internet
Allow the server to Remote Manage the client.

2.4.1.3 - Remote Access

This is the classic road-warrior setup where remote peers initiate a connection to the central peer. That central system forwards their traffic onward to the corporate network.

Traffic Handling

The main choice is route or masquerade.

Routing

If you route, the client’s VPN IP address is what other devices see. This is generally preferred as it allows you to log who was doing what at the individual servers. But you must update your network equipment to treat the central server as a router.

Masquerading

Masquerading causes the server to translate all the traffic. This makes everything look like its coming from the server. It’s less secure, but less complicated and much quicker to implement.

For this example, we will masquerade traffic from the server.

Central Server Config

Enable Masquerade

Use sysctl to enable forwarding on the server and nft to add masquerade.

# as root
sysctl -w net.ipv4.ip_forward=1

nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade

Persist Changes

It’s best if we add our new rules onto the defaults and enable the nftables service.

# as root
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

nft list ruleset >> /etc/nftables.conf

systemctl enable --now  nftables.service

Client Config

Your remote peer - the one you created when setting up the server - needs it’s AllowedIPs adjusted so it knows to send more traffic through the tunnel.

Full Tunnel

This sends all traffic from the client over the VPN.

AllowedIPs = 0.0.0.0/0

Split Tunnel

The most common config is to send specific networks through the tunnel. This keeps netflix and such off the VPN

AllowedIPs = 192.168.100.0/24, 192.168.XXX.XXX, 192.168.XXX.YYY

DNS

In some cases, you’ll need the client to use your internal DNS server to resolve private domain names. Make sure this server is in the AllowedIPs above.

[Interface]
PrivateKey = ....
Address = ...
DNS = 192.168.100.1

Access Control

Limit Peer Access

By default, everything is open and all the peers can talk to each other and the internet at large - even NetFlix! (they can edit their side of the connection at will). So let’s add some rules to the default filter table.

This example prevents peers from from talking to each other but let’s them ping the central server and reach the corporate network.

# Load the base config in case you haven't arleady. This includes the filter table
sudo nft -f /etc/nftables.conf

# Reject any traffic being sent outside the 192.168.100.0/24
sudo nft add rule inet filter forward iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp type admin-prohibited

# Reject any traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

Grant Admin Access

You may want to add an exception for one of the addresses so that an administrator can interact with the remote peers. Order matters, so add it before before the other rules above

sudo nft -f /etc/nftables.conf

# Allow an special 'admin' peer full access and others to reply
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept

# As above
...
...

Save Changes

Since this change is a little more complex, we’ll replace the existing file config file and add notes.

sudo vi /etc/nftables.conf

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0
        }
        chain forward {
                type filter hook forward priority 0

                # Accept admin traffic and responses
                iifname "wg0" ip saddr 192.168.100.2 accept
                iifname "wg0" ct state {established, related} accept

                # Reject other traffic between peers
                iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

                # Reject traffic outside the desired network
                iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp admin-prohibited
        }
        chain output {
                type filter hook output priority 0
        }
}
table ip nat {
        chain postrouting {
                type nat hook postrouting priority srcnat
                masquerade
        }
}

Note: The syntax of the file is slightly different than the command. You can use nft list ruleset to see how nft config and commands translate into running rules. For example - the policy accept is being appended. You may want to experiment with explicitly adding policy drop.

The forwarding chain is where routing type rules go (the input chain is traffic sent to the host itself). Prerouting might work as well, though it’s less common and not present by default.

Notes

The default nftable config file in Debian is:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority filter;
        }
        chain forward {
                type filter hook forward priority filter;
        }
        chain output {
                type filter hook output priority filter;
        }
}

If you have old iptables rules you want to translate to nft, you can install iptables and add them (they get translated on the fly into nft) and nft list ruleset to see how to they turn out.

2.4.1.4 - Remote Mgmt

In this scenario peers initiate connections to the central server, making their way through NAT and Firewalls, but you don’t want to forward their traffic.

Central Server Config

No forwarding or masquerade is desired, so there is no additional configuration to the central server.

Client Config

The remote peer - the one you created when setting up the server - is already set up with one exception; a keep-alive.

When the remote peer establishes it’s connection to the central server, intervening firewalls allow you to talk back as they assume it’s in response. However, the firewall will eventually ‘close’ this window unless the client continues sending traffic occasionally to ‘keep alive’ the connection.

# Add this to the bottom of your client's conf file
PersistentKeepalive = 20

Firewall Rules

You should apply some controls to your clients to prevent them from talking to each other (and the server), and you also need a rule for the admin station. You can do this by adding rules to the forward chain.

# Allow an 'admin' peer at .2 full access to others and accept their replies
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept
# Reject any other traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

You can persist this change by editing your /etc/nftables.conf file to look like this.

sudo vi /etc/nftables.conf

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0;
        }
        chain forward {
                type filter hook forward priority 0;

                # Accept admin traffic
                iifname "wg0" ip saddr 192.168.100.2 accept
                iifname "wg0" ct state {established, related} accept

                # Reject other traffic between peers
                iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
        }
        chain output {
                type filter hook output priority 0;
        }
}
table ip nat {
        chain postrouting {
                type nat hook postrouting priority srcnat; policy accept;
                masquerade
        }
}

2.4.1.5 - Routing

In our remote access example we chose to masquerade. But you can route where your wireguard server forwards traffic with the VPN addresses intact. You must handle that on your network in one of the following ways.

Symmetric Routing

Classically, you’d treat the wireguard server like any other router. You’d create a management interface and/or a routing interface and advertise routes appropriately.

On a small network, you would simply overlay an additional IP range on top of the existing on by adding a second IP address on your router and put your wireguard server on that network. Your local servers will see the VPN addressed clients and send traffic to the router that will pass it to the wireguard server.

Asymmetric Routing

In a small network you might have the central peer on the same network as the other servers. In this case, it will be acting like a router and forwarding traffic, but the other servers won’t know about it and so will send replies back to their default gateway.

To remedy this, add a static route at the gateway for the VPN range that sends traffic back to the central peer. Asymmetry is generally frowned upon, but it gets the job done with one less hop.

Host Static Routing

You can also configure the servers in question with a static route for VPN traffic so they know to send it directly back to the Wireguard server. This is fastest but you have to visit every host. Though you can use DHCP to distribute this route in some cases.

2.4.1.6 - Point to Point

If both system are listening then either side can initiate a connection. That’s essentially what a Point-to-Point setup is. Simply translate create two ‘servers’ and add a peer block to each one about the other. They will connect as needed.

This is best done with a routed config where clients who know nothing about the VPN use one side as their gateway for a given network range, and the servers act as routers. I don’t have an example config for this, but if you’ve reached this point you can probably handle that yourself.

2.4.1.7 - TrueNAS Scale

You can directly bring up a Wireguard interface in TrueNAS Scale, and use that to remotely manage it.

Wireguard isn’t exposed in the GUI, so use the command line to create a config file and enable the service. To make it persistent between upgrades, add a cronjob to restore the config.

Configuration

Add a basic peer as when setting up a Central Server and save the file on the client as /etc/wireguard/wg1.conf. It’s rumored that wg0 is reserved for the TrueNAS cloud service. Once the config is in place, use wg-quick up wg1 command to test and enable as below.

nano /etc/wireguard/wg1.conf

systemctl enable --now wg-quick@wg1

If you use a domain name in this conf for the other side, this service will fail at boot because DNS isn’t up and it’s not easy to get it to wait. So add a pre-start to the service file to specifically test name resolution.

vi /lib/systemd/system/[email protected]

[Service] 
...
...
ExecStartPre=/bin/bash -c 'until host google.com; do sleep 1; done'

Note: Don’t include a DNS server in your wireguard settings or everything on the NAS will attempt to use your remote DNS and fail if the link goes down.

Accessing Apps

When deploying an app, click the enable “Host Network” or “Configure Host Network” box in the apps config and you should be able to access via the VPN address. On Cobia (23.10) at least. If that fails, you can add a command like this to a post-start in the wireguard config file.

iptables -t nat -A PREROUTING --dst 192.168.100.2 -p tcp --dport 20910 -j DNAT --to-destination ACTUAL.LAN.IP:20910

Detecting IP Changes

The other side of your connection may dynamic address and wireguard wont know about it. A simple solution is a cron job that pings the other side periodically, and if it fails, restarts the interface. This will lookup the domain name again and hopefully find the new address.

touch /etc/cron.hourly/wg_test
chmod +x /etc/cron.hourly/wg_test
vi /etc/cron.hourly/wg_test

#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( wg-quick down wg1 ; wg-quick up wg1 )

Troubleshooting

Cronjob Fails

cronjob kills interface when it can’t ping

/usr/local/bin/wg-quick: line 32: resolvconf: command not found

Calling wg-quick via cron causes a resolvconf issue, even though it works at the command line. One solution is to remove any DNS config from your wg conf file so it doesn’t try to register the remote DNS server.

Nov 08 08:23:59 truenas wg-quick[2668]: Name or service not known: `some.server.org:port' Nov 08 08:23:59 truenas wg-quick[2668]: Configuration parsing error … Nov 08 08:23:59 truenas systemd[1]: Failed to start WireGuard via wg-quick(8) for wg1.

The DNS service isn’t available (yet), despite Requires=network-online.target nss-lookup.target already in the service unit file. One way to solve this is a pre-exec in the Service section of the unit file¹. This is hacky, but none of the normal directives work.

The cron job above will bring the service up eventually, but it’s nice to have it at boot.

Upgrade Kills Connection

An upgrade comes with a new OS image and that replaces anything you’ve added, such as wireguard config and cronjobs. The only way to persist your Wireguard connection it to put a script on the pool and add a cronjob via the official interface².

Add this script and change for your pool location. This is set to run every 5 min, as you probably don’t want to wait after an upgrade very long to see if it’s working. You can also use this to detect IP changes over the cron.hourly above.

# Create the location and prepare the files
mkdir /mnt/pool02/bin/
cp /etc/wireguard/wg1.conf /mnt/pool02/bin/
touch /mnt/pool02/bin/wg_test
chmod +x /mnt/pool02/bin/wg_test

# Edit the script 
vi /mnt/pool02/bin/wg_test

#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( cp /mnt/pool02/bin/wg1.conf /etc/wireguard/ ; wg-quick down wg1 ; wg-quick up wg1 )


# Invoke the TrueNAS CLI and add the job
cli
task cron_job create command="/mnt/pool02/bin/wg_test" enabled=true description="test" user=root schedule={"minute": "*/5", "hour": "*", "dom": "*", "month": "*", "dow": "*"}

Notes

https://www.truenas.com/docs/core/coretutorials/network/wireguard/ https://www.truenas.com/community/threads/no-internet-connection-with-wireguard-on-truenas-scale-21-06-beta-1.94843/#post-693601

2.4.1.8 - Proxmox

Proxmox is frequently used in smaller environments for it’s ability to mix Linux Containers and Virtual Machines at very low cost. LCD - Linux Containers - are especially valuable as they give the benefits of virtualization with minimal overhead.

Using wireguard in a container simply requires adding the host’s kernel module interface.

Edit the container’s config

On the pve host, for lxc id 101:

echo "lxc.mount.entry = /dev/net/tun /dev/net/tun none bind create=file" >> /etc/pve/lxc/101.conf

Older Proxmox

In the past you had to install the module, or use the DKMS method. That’s no longer needed as the Wireguard kernel module is now available on proxmox with the standard install. You don’t even need to install the wireguard tools. But if you run into trouble you can go through these steps

apt install wireguard
modprobe wireguard

# The module will load dynamically when a conainter starts, but you can also manually load it
echo "wireguard" >> /etc/modules-load.d/modules.conf

2.4.1.9 - LibreELEC

LibreELEC and CoreELEC are Linux-based open source software appliances for running the Kodi media player. These can be used as kiosk displays and you can remotely manage them with wireguard.

Create a Wireguard Service

These systems have wireguard support, but use connman that lacks split-tunnel ability¹. This forces all traffic through the VPN and so is unsuitable for remote management. To enable split-tunnel, create a wireguard service instead.

Create a service unit file

vi /storage/.config/system.d/wg0.service

[Unit]
Description=start wireguard interface

# The network-online service isn't guaranteed to work on *ELEC
#Requires=network-online.service

After=time-sync.target
Before=kodi.service

[Service]
Type=oneshot
RemainAfterExit=true
StandardOutput=journal

# Need to check DNS is responding before we proceed
ExecStartPre=/bin/bash -c 'until nslookup google.com; do sleep 1; done'

ExecStart=ip link add dev wg0 type wireguard
ExecStart=ip address add dev wg0 10.1.1.3/24
ExecStart=wg setconf wg0 /storage/.config/wireguard/wg0.conf
ExecStart=ip link set up dev wg0
# On the newest version, a manual route addition is needed too
ExecStart=ip route add 10.2.2.0/24 dev wg0 scope link src 10.1.1.3

# Deleting the device seems to remove the address and routes
ExecStop=ip link del dev wg0

[Install]
WantedBy=multi-user.target

Create a Wireguard Config File

Note: This isn’t exactly the same file wg-quick uses, just close enough to confuse.

vi /storage/.config/wireguard/wg0.conf

[Interface]
PrivateKey = XXXXXXXXXXXXXXX

[Peer]
PublicKey = XXXXXXXXXXXXXXX
AllowedIPs = 10.1.1.0/24
Endpoint = endpoint.hostname:31194
PersistentKeepalive = 25

Enable and Test

systemctl enable --now wg0.service
ping 10.1.1.1

Create a Cron Check

When using a DNS name for the endpoint you may become disconnected. To catch this, use a cron job

# Use the internal wireguard IP address of the peer you are connecting to. .1 in this case
crontab -e
*/5 * * * * ping -c1 -W5 10.1.1.1 || ( systemctl stop wg0; sleep 5; systemctl start wg0 )

https://wiki.libreelec.tv/configuration/wireguard#known-issues ↩︎

2.4.1.10 - OPNsense

The simplest way to deploy Wireguard is to use the built-in feature of your router. For OPNsense, it’s as simple as:

Create an Instance
Create a Peer and Enable Wireguard
Add a WAN Rule
Add a Wireguard Interface Rule

Configuration

Create an Instance

This is your server. Even though in wireguard all systems are considered peers, this is the system that is going to stay up all the time and accept connections, so it’s safe to think of it as ’the server'.

Navigate to:

VPN -> Wiregurad -> Instances

Click the + button on the right to add an instance. You can leave everything at the default except for:

Name # This can be anything you want, such as ‘Home’ or ‘Instance-1’
Public Key # Click the gear icon to generate keys
Listen Port # You’ll need to choose one, or it will somewhat unpredictable
Tunnel Address # Pick an IP range that you’re not using anywhere else

Save, but don’t click ‘Enable’ on the main screen yet.

Create a Peer

This is your phone or other enpoint that will be initiating the connection to the server. Navigate to:

VPN -> Wiregurad -> Peer Generator

It’s safe to leave everything at default except:

Endpoint # This your WAN address or hostname and port. e.g. “my.cool.org:51820”
Name # The thing connecting in, like “Allens-Phone”

If this is your first client, you may need to configure an IP. It’s safe to start one up from your server’s internal tunnel address, but don’t click the button for Store and generate next yet.

Copy the config box to a text file and get it to your client, or use the QR if you have a phone handy. Once you’ve saved the info, then click

“Store and generate next”

The GUI has automatically added the client to instance you created earlier, so at the bottom you can:

Enable Wiregaurd
Apply

(You can enable Wireguard at the bottom of any of these screens)

Add a WAN Rule

Firewall -> Rules -> WAN

Click ‘+’ to add a rule, and add

Interface: WAN
Protocol: UDP
Destination Port Range: (other) 51820

Add a Wireguard Interface Rule

Wireguard works by creating a network interface and Opnsense helpfully adds that alongside the LAN and WAN interfaces. You’ll notice it actually creates a group and if you had other instances they will (probably) be included.

Simply click the ‘+’ button to add a rule and save without changing any of the defaults. This allows you to leave the tunnel and talk to things on the LAN.

Operation

At this point you can connect from the client. If you look in the details it should add a line about ‘Latest handshake’ after a few seconds. If not, you’ll have to troubleshoot as below.

Adding new clients is similar to the first one, just make sure to disable and enable the service or the new clients won’t get picked up.

https://docs.opnsense.org/manual/how-tos/wireguard-client.html#step-4-b-create-an-outbound-nat-rule

Notes

I used the official setup guide at https://docs.opnsense.org/manual/vpnet.html#wireguard and it has a few flaws.

Mostly, it describes a more complex setup than just a remote access. They note two steps:

Create the server and peer
Create the rules. Under Firewall –> Rules, add one under
- WAN
- WireGuard (Group)

The issue is that the second category isn’t visible right away. Once it is, you can use the group, not the IP address. It’s unclear why the docs point you away from that.

Then I had to reboot to get it to work, which is very odd.

This turns out to be a general issue when you add a client and the service is already active. You can’t restart the service, you must disable and re-enable the service from the wireguard sub page

3 - Infrastructure

3.1 - Data

3.1.1 - Backup

A modern, non-enterprise, backup solution for an individual client should be:

Non-generational (i.e. not have to rely on full and incremental chains)
De-Duplicated
Support Pruning (of old backups)
Support Cloud Storage (and encryption)
Open Source (Ideally)

For built-in solutions, Apple has Time Machine, Windows has File History (and Windows Backup), and Linux has…well, a lot of things.

Rsync is a perennial favorite and a good (short) historical post on the evolution of rsync style backups is here over at Reddit. Though I hesitate to think of it as backup, because it meets none of the above features.

<https://www.reddit.com/r/linux/comments/42feqz/i_asked_here_for_the_optimal_backup_solution_and/czbeuby?

Duplicity is a well established traditional solution that supports Amazon Cloud Drive, but it relies on generational methods meaning a regular full backup is required. That’s resource intensive with large data sets.

Restic is interesting, but doesn’t work with many cloud providers; specifically Amazon Cloud Drive

https://github.com/restic/restic/issues/212

Obnam and Borg are also interesting, but similarly fail with Amazon Cloud Drive.

restic panic: rename
https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive

Duplicati supports ACD as long as you’re willing to add mono. Though it’s still beta as of this writing.

sudo /usr/lib/duplicati/Duplicati.Server.exe --webservice-port=8200 --webservice-interface=any

And some other background.

<http://silverskysoft.com/open-stack-xwrpr/2015/08/the-search-for-the-ideal-backup-tool-part-1-of-2/>
<http://changelog.complete.org/archives/9353-roundup-of-remote-encrypted-deduplicated-backups-in-linux>
<http://www.acronis.com/en-us/resource/tips-tricks/2004/generational-backup.html>

3.1.2 - Directory Services

3.1.2.1 - Apple and AD

Here’s the troubleshooting process

Verify DNS Records according to apple’s document.

DOMAIN=gattis.org
dns-sd -q _ldap._tcp.$DOMAIN SRV
dns-sd -q _kerberos._tcp.$DOMAIN SRV
dns-sd -q _kpasswd._tcp.$DOMAIN SRV
dns-sd -q _gc._tcp.$DOMAIN SRV

Ping the results. Then test for ports according the Microsoft’s document.

HOST=dc01.gattis.org
nc -z -v -u $HOST 88
nc -z -v -u $HOST 135
nc -z -v $HOST 135
nc -z -v -u $HOST 389
nc -z -v -u $HOST 445
nc -z -v $HOST 445
nc -z -v -u $HOST 464
nc -z -v $HOST 464
nc -z -v $HOST 3268
nc -z -v $HOST 3269
nc -z -v $HOST 53
nc -z -v -u $HOST 53
nc -z -v -u $HOST 123

A useful script is like so

#!/bin/bash

HOST=dc01.gattis.local
#HOST=dc02.gattis.local


## declare an array of the commands to run
declare -a COMMANDS=(\
"nc -z -u $HOST 88" 
"nc -z -u $HOST 135" 
"nc -z    $HOST 135" 
"nc -z -u $HOST 389" 
"nc -z -u $HOST 445" 
"nc -z    $HOST 445" 
"nc -z -u $HOST 464" 
"nc -z    $HOST 464" 
"nc -z    $HOST 3268" 
"nc -z    $HOST 3269" 
"nc -z    $HOST 53" 
"nc -z -u $HOST 53" 
"nc -z -u $HOST 123")

PIDS=""
for i in "${COMMANDS[@]}";do
    $i &
    PIDS+="$! "
done

3.1.2.2 - LDAP

sudo apt-get install libnss-ldap ldap-utils

# To get the attribute 'memberOf'

# Simple Bind with TLS
ldapsearch -v -x -Z -D "[email protected]" -W -H ldap://ad.domain.local -b  'OU=People,DC=domain,DC=local' '(sAMAccountName=someuser)' memberOf

# older style
ldapsearch -v -D "[email protected]" -w Passw0rd -H ldap://ad1.domain.local -b 'OU=People,DC=domain,DC=local' '(sAMAccountName=someuser)' memberOf

# Get all user accounts from AD created since 2007-07.
ldapsearch -v -x -Z -D "[email protected]" -W -H ldap://ad1.domain.local -b 'DC=domain,DC=local' -E pr=1000/noprompt '(&(objectClass=user)(whenCreated>=20100701000000.0Z))' sAMAccountName description whenCreated > all

3.1.3 - File System

3.1.3.1 - BTRFS

3.1.3.1.1 - Kernel Updates for BTRFS

You man want to use newer BTRFS features on an older OS.

With debian your choices are:

Install from backports
Install from release candidates
Install from generic
Build from source

Install from Backports

It’s often recommended to install from backports. These are newer versions of apps that have been explicitly taken out of testing and packaged for the stable release. i.e. if you’re running ‘buster’ you would install from buster-backports.

It’s possible that one would use the identifiers ‘stable’ and ’testing’ which pegs you to whatever is current, rather than to a specific release.

echo deb http://deb.debian.org/debian buster-backports main | sudo tee /etc/apt/sources.list.d/buster-backports.list
sudo apt update

# seach for the most recent amd64 image.
sudo apt search -t buster-backports linux-image-5
sudo apt install -t buster-backports linux-image-5.2.0-0.bpo.3-amd64-unsigned
sudo apt install -t buster-backports btrfs-progs

Install from Release Candidate

If there’s no backport, you can install from the release candidate. These are simply upcoming versions of debian that haven’t been released yet

To install the kernel from the experimental version of debian, add the repo to your sources and explicitly add the kernel (this is safe to add to your repos because experimental packages aren’t installed by default)

sudo su -c "echo deb http://deb.debian.org/debian experimental main > /etc/apt/sources.list.d/experimental.list"
sudo apt update
sudo apt -t experimental search linux-image
sudo apt -t experimental install linux-image-XXX
sudo apt -t experimental install btrfs-progs

Install from Generic

You can also download the packages and manually install.

Navigate to

http://kernel.ubuntu.com/~kernel-ppa/mainline/

And download, similar to this (from a very long time ago :-)

wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-headers-5.0.0-050000_5.0.0-050000.201903032031_all.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-headers-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-image-unsigned-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-modules-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb

Troubleshooting

The following signatures couldn’t be verified because the public key is not available:

sudo apt-key adv –recv-key –keyserver keyserver.ubuntu.com 648ACFD622F3D138

Sources

https://unix.stackexchange.com/questions/432406/install-the-latest-rc-kernel-on-debian https://wiki.debian.org/HowToUpgradeKernel https://www.tecmint.com/upgrade-kernel-in-ubuntu/ https://raspberrypi.stackexchange.com/questions/12258/where-is-the-archive-key-for-backports-debian-org/60051#60051

3.1.3.2 - Ceph

Ceph is a distributed object storage system that supports both replicated and erasure coding of data. So you can design it to be fast or efficient. Or even both.

It’s complex compared to other solutions, but it’s also the main focus of Redhat and other development. So it may eclipse other technologies just through adaption.

It also comes ‘baked-in’ with Proxmox. If you’re already using PVE it’s worth deploying over others, as long as you’re willing to devote the time to learning it.

3.1.3.2.1 - Ceph Proxmox

Overview

There are two main use-cases;

Virtual Machines
Bulk Storage

They both provide High Availability. But VMs need speed whereas Bulk should be economical. What makes Ceph awesome is you can do both - all with the same set of disks.

Preparation

Computers

Put 3 or 4 PCs on a LAN. Have at least a couple HDDs in addition to a boot drive. 1G of RAM per TB of disk is recommended¹ and it will use it. You can have less RAM, it will just be slower.

Network

The speed of Ceph is essentially a third or half your network speed. With a 1 Gig NIC you’ll average around 60 MB/Sec for file operations (multiple copies are being saved behind the scenes). This sounds terrible, but in reality its fine for a cluster that serves up small files and/or media streams. But it will take a long time to get data onto the system.

If you need more, install a secondary NIC for Ceph traffic. Do that before you configure the system, if you can. Doing it after is hard. You can use a mesh config via the PVE docs² or purchase a switch. Reasonable 2.5 Gb switches and NICs can now be had.

Installation

Proxmox

Install PVE and cluster the servers.

Ceph

Double check the Ceph repo is current by comparing what you have enabled:

grep ceph  /etc/apt/sources.list.d/*

Against what PVE has available.

curl -s https://enterprise.proxmox.com/debian/ | grep ceph

If you don’t have the latest, take a look at Install PVE to update the Ceph repo.

After that, log into the PVE web GUI, click on each server and in that server’s submenu, click on Ceph. It will prompt for permission to install. When the setup windows appears, select the newest version and the No-Subscription repository. You can also refer to the official notes.

The wizard will ask some additional configuration options for which you can take the defaults and finish. If you have an additional ceph-specific network hardware, set it up with a separate IP range and choose interface for both the public network and cluster network.

Configuration

Ceph uses multiple daemons on each node.

Monitors, to keep track of what’s up and down.
Managers, to gather performance data.
OSDs, a service per disk to store and retrieve data.
Metadata Servers, to handle file permissions and such.

To configure them, you will:

Add a monitor and manager to each node
Add each node’s disks as OSD, or (Object Storage Devices)
Add metadata servers to each node
Create Pools, where you group OSDs and choose the level of resiliency
Create a Filesystem

Monitors and Managers

Ceph recommends at least three monitors³ and manager processes⁴. To install, click on a server’s Ceph menu and select the Monitor menu entry. Add a monitor and manager to the first three nodes.

OSDs

The next step is to add disks - aka Object Storage Devices. Select a server from the left-hand menu, and from that server’s Ceph menu, select OSD. Then click Create: OSD, select the disks and click create. If they are missing, enter the server’s shell and issue the command wipefs -a /dev/sdX on the disks in question.

If you have a lot of disks, you can also do this at the shell

# Assuming you have 5 disks, starting at 'b'
for X in {b..f}; do echo pveceph osd create /dev/sd$X; done

Metadata Servers

To add MDSs, click on the CephFS submenu for each server and click Create under the Metadata Servers section. Don’t create a CephFS yet, however.

Pools and Resiliency

We are going to create two pools; a replicated pool for VMs and an erasure coded pool for bulk storage.

If you want to mix SSDs and HDDs see the storage tiering page before creating the pools. You’ll want to set up classes of storage and create pools based on that page.

Replicated Pool

On this pool we’ll use the default replication level that gives us three copies of everything. These are guaranteed at a per-host level. Loose any one or two hosts, no problem. But loose individual disks from all three hoots at the same time and you’re out of luck.

This is the GUI default so creation is easy. Navigate to a host and select Ceph –> Pools –> Create. The defaults are fine and all you need do is give it a name, like “VMs”. You may notice there is already a .mgr pool. That’s created by the manager service and safe to ignore.

If you only need storage for VMs and containers, you can actually stop here. You can create VMs directly on your new pool, and containers on local storage then migrate (Server –> Container –> Resources –> Root Disk -> Volume Action –> Target Storage)

Erasure Coded Pool

Erasure coding requires that you determine how many data and parity bits to use, and issue the create command in the terminal. For the first question, it’s pretty simple - if you have three servers you’ll need 2 data and 1 parity. The more systems you have the more efficient you’ll be, though when you get to 6 you should probably increase your parity. Unlike the replicated pool, you can only loose one host with this level of protection.

Here’s the command⁵ for a 3 node system that can withstand one node loss (2,1). For a 4 node system you’d use (3,1) and so on. Increase the first number as your node count goes up, and the second as you desire more resilience. Ceph doesn’t require a ‘balanced’ cluster in terms of drives, but you’ll loose some capacity if you don’t have roughly the same amount of space on each node.

# k is for data and m is for parity
pveceph pool create POOLNAME --erasure-coding k=2,m=1 --pg_autoscale_mode on --application cephfs

Note that we specified application in the command. If you don’t, you won’t be able to use it for a filesystem later on. We also specified PGs (placement groups) as auto-scaling. This is how Ceph chunks data as it gets sent to storage. If you know how much data you have to load, you can specify the starting number of PGs with the --pg_num parameter. This will make things a little faster for an initial copy. Redhat suggests⁶ the OSD*100 / K+M. You’ll get a warning⁷ from Ceph if it’s not a power of 2 (2, 4, 8, 16, 32, 64, 128, 256, 512) so use the closest number, such as --pg_num 512.

If you don’t know how much data you’re going to bing in, but expect it to be a lot, you can turn on the bulk flag, rather than specifying pg_num.

# Make sure to add '-data' at the end
ceph osd pool set POOLNAME-data bulk true

When you look at the pools in the GUI you’ll see it also created two pools, one for data and one for metadata, which isn’t compatible with EC pools yet. You’ll also notice that you can put VMs and containers on this pool just like the replicated pool. It will just be slower.

Filesystem

The GUI won’t allow you to choose the erasure coded pool you just created so you’ll use the command line again. The name you pick for your Filesystem will be how it’s mounted.

ceph osd pool set POOLNAME-data bulk true
ceph fs new FILE-SYSTEM-NAME POOLNAME-metadata POOLNAME-data --force

To mount it cluster-wide, go back to the Web GUI and select Datacenter at the top left, then Storage. Click Add and select CephFS as the type. In the subsequent dialog box, put the name you’d like it mounted as in the ID field, such as “bulk” and leave the rest at their defaults.

You can now find it mounted at /mnt/pve/IDNAME and you can bind mount it to your Containers or setup NFS for your VMs.

Operation

Failure is Always an Option

Ceph defaults to a failure domain of ‘host’. That means you can loose a whole host with all it’s disks and continue operating. You can also lose individual disks from different hosts, but operations are NOT guaranteed. If you have two copies, as well and continue operating. After a short time, Ceph will re-establish parity as disks fail or hosts remain off-line. Should they come back, it will re-adjust. Though in both cases this can take some time.

Rebooting a host

Ceph immediately panics and starts reestablishing resilience. When the host comes back up, it starts redoing it back. This is OK, but Redhat suggests to avoid it with a few steps.

On the node you want to reboot:

sudo ceph osd set noout
sudo ceph osd set 
sudo reboot

# Log back in and check that the pgmap reports all pgs as normal (active+clean). 
sudo ceph -s

# Continue on to the next node
sudo reboot
sudo ceph -s

# When done
sudo ceph osd unset noout
sudo ceph osd unset norebalance

# Perform a final status check to make sure the cluster reports HEALTH_OK:
sudo ceph status

Troubleshooting

Pool Creation Error

If you created a pool but left off the –application flag it will be set to RDP by default. You’d have to change it from RDP to CephFS like so, for both the data and metadata

ceph osd pool application enable srv-data cephfs --yes-i-really-mean-it
ceph osd pool application disable srv-data rdb --yes-i-really-mean-it
ceph osd pool application enable srv-metadata cephfs --yes-i-really-mean-it
ceph osd pool application disable srv-metadata rdb --yes-i-really-mean-it

Cluster IP Address Change

If you want to change your IP addresses, you may be able to just change the public network in the /etc/pve/ceph.conf and then destroy and recreate what it says. This worked. I don’t know if it’s good. I think the OSD cluster network needed changed also.

Based on https://www.reddit.com/r/Proxmox/comments/p7s8ne/change_ceph_network/

3.1.3.2.2 - Ceph Tiering

Overview

If you have a mix of workloads you should create a mix of pools. Cache tiering is out¹. So use a mix of NVMEs, SSDs, and HDDs with rules to control what pool uses what class of device.

In this example, we’ll create a replicated SSD pool for our VMs, and a erasure coded HDD pool for our content and media files.

Initial Rules

When an OSD is added, its device class is automatically assigned. The typical ones are ssd or hdd. But either way, the default config will use them all as soon as you add them. Let’s change that by creating some additional rules

Replicated Data

For replicated data, it’s as easy as creating a couple new rules and then migrating data, if you have any.

New System

If you haven’t yet created any pools, great! we can create the rules so they are available when creating pools. Add all your disks as OSDs (visit the PVE ceph menu for each server in the cluster). Then add these rules at the command line of any PVE server to update the global config.

# Add rules for both types
#
# The format is
#    ceph osd crush rule create-replicated RULENAME default host CLASSTYPE
#
ceph osd crush rule create-replicated replicated_hdd_rule default host hdd
ceph osd crush rule create-replicated replicated_ssd_rule default host ssd

And you’re done! When you create a pool in the PVE GUI, click the advanced button and choose the appropriate CRUSH rule from the drop-down. Or you can create one now while you’re at the command line.

# Create a pool for your VMs on replicated SSD. Default replication is used (so 3 copies)
#  pveceph pool create POOLNAME --crush_rule RULENAME --pg_autoscale_mode on
pveceph pool create VMs --crush_rule  replicated_ssd_rule --pg_autoscale_mode on

Existing System

With an existing system you must migrate your data. If you’ve haven’t added your SSDs yet, do so now. It will start moving data using the default rule, but we’ll apply a new rule that will take over.

# Add rules for both types
ceph osd crush rule create-replicated replicated_hdd_rule default host hdd
ceph osd crush rule create-replicated replicated_ssd_rule default host ssd

# If you've just added SSDs, apply the new rule right away to minimize the time spent waiting for data moves.
# Use the SSD or HDD rule as you prefer. In this example we're moving POOLNAME to SSDs
ceph osd pool set VMs crush_rule replicated_sdd_rule

Erasure Coded Data

On A New System

EC data is a little different. You need a profile to describe the resilience and class, and Ceph manages the CRUSH rule directly. But you can have the pveceph to do this for you.

# Create pool name 'Content' with 2 data and 1 parity. Add --application cephfs as we're using this for file storage. The --crush_rule affects the metadata pool so its on fast storage.
pveceph pool create Content --erasure-coding k=2,m=1,device-class=hdd --crush_rule  replicated_ssd_rule --pg_autoscale_mode on --application cephfs

You’ll notice separate pools for data and metadata were automatically created as the latter doesn’t support EC pools yet.

On An Existing System

Normally, you set device class as part of creating a profile and you cannot change the profile after creating the pool². However, you can change the CRUSH rule and that’s all we need for changing the class.

# Create a new profile that to base a CRUSH rule on. This one uses HDD
#  ceph osd erasure-code-profile set PROFILENAME crush-device-class=CLASS k=2 m=1
ceph osd erasure-code-profile set ec_hdd_2_1_profile crush-device-class=hdd k=2 m=1

# ceph osd crush rule create-erasure RULENAME PROFILENAME (from above)
ceph osd crush rule create-erasure erasure_hdd_rule ec_hdd_2_1_profile

# ceph osd pool set POOLNAME crush_rule RULENAME
ceph osd pool set Content-data crush_rule erasure_hdd_rule

Don’t forget about the metadata pool and it’s a good time to turn on bulk setting if you’re going to store a lot of data.

# Put the metadata pool on SSD for speed
ceph osd pool set Content-metadata crush_rule replicated_ssd_rule
ceph osd pool set Content-data bulk true

Other Notes

NVME

There’s some reports that NVMe aren’t separated from SSDs. You may need to create that class and turn off auto detection, though this is quite old information.

Investigation

When investigating a system, you may want to drill down with thees commands.

ceph osd lspools
ceph osd pool get VMs crush_rule
ceph osd crush rule dumpreplicated_ssd_rule
# or view rules with
ceph osd getcrushmap | crushtool -d -

Data Loading

The fastest way is to use a Ceph Client at the source of the data, or at least separate the interfaces.

With 1Gb NIC, one of the Ceph storage servers also connected to an external NFS and coping data to a CephFS.

12 MB/sec

Same, but reversed with NFS server itself running the Ceph client pushing the data.

103 MB/sec

Creating and Destroying

https://dannyda.com/2021/04/10/how-to-completely-remove-delete-or-reinstall-ceph-and-its-configuration-from-proxmox-ve-pve/

# Adjust letters as needed
for X in {b..h}; do pveceph osd create /dev/sd${X};done
mkdir -p /var/lib/ceph/mon/
mkdir  /var/lib/ceph/osd

# Adjust numbers as needed
for X in {16..23};do systemctl stop ceph-osd@${X}.service;done
for X in {0..7}; do umount /var/lib/ceph/osd/ceph-$X;done
for X in {a..h}; do ceph-volume lvm zap /dev/sd$X --destroy;done

3.1.3.2.3 - Ceph Client

This assumes you already have a working cluster and a ceph file system.

Install

You need the ceph software. You use the cephadm tool, or add the repos and packages manually. You also need to pick what version by it’s release name; ‘Octopus, Nautilus, etc’

sudo apt install software-properties-common gnupg2
wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -

# Discover the current release. PVE is a good place to check when at the command line
curl -s https://enterprise.proxmox.com/debian/ | grep ceph

# Note the release name after debian, 'debian-squid' in this example.
sudo apt-add-repository 'deb https://download.ceph.com/debian-squid/ bullseye main'

sudo apt update; sudo apt install ceph-common -y

#
# Alternatively 
#

curl --silent --remote-name --location https://github.com/ceph/ceph/raw/squid/src/cephadm/cephadm
chmod +x cephadm
./cephadm add-repo --release squid
./cephadm install ceph-common

Configure

On a cluster member, generate a basic conf and keyring for the client

# for a client named 'minecraft'

ceph config generate-minimal-conf > /etc/ceph/minimal.ceph.conf

ceph-authtool --create-keyring /etc/ceph/ceph.client.minecraft.keyring --gen-key -n client.minecraft

You must add file system permissions by adding lines to the bottom of the keyring, then import it to the cluster.

nano  /etc/ceph/ceph.client.minecraft.keyring

# Allowing the client to read the root and write to the subdirectory '/srv/minecraft'
caps mds = "allow rwps path=/srv/minecraft"
caps mon = "allow r"
caps osd = "allow *"

Import the keyring to the cluster and copy it to the client

ceph auth import -i /etc/ceph/ceph.client.minecraft.keyring
scp minimal.ceph.conf ceph.client.minecraft.keyring [email protected]:

On the client, copy the keyring and rename and move the basic config file.

ssh [email protected]

sudo cp ceph.client.minecraft.keyring /etc/ceph
sudo cp minimal.ceph.conf /etc/ceph/ceph.conf

Now, you may mount the filesystem

# the format is "User ID" @ "Cluster ID" . "Filesystem Name" = "/some/folder/on/the/server" "/some/place/on/the/client"
# You can get the cluster ID from your server's ceph.conf file and the filesystem name 
# with a ceph fs ls, if you don't already know it. It will be the part after name, as in "name: XXXX, ..."

sudo mount.ceph [email protected]=/srv/minecraft /mnt

You can and entry to your fstab like so

[email protected]=/srv/minecraft /mnt ceph noatime,_netdev    0       2

Troubleshooting

source mount path was not specified
unable to parse mount source: -22

You might have accidentally installed the distro’s older version of ceph. The mount notation above is based on “quincy” aka ver 17

ceph --version

  ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)

Try an apt remove --purge ceph-common and then apt update before trying a apt install ceph-common again.

**unable to get monitor info from DNS SRV with service name: ceph-mon**

Check your client’s ceph.conf. You may not have the right file in place

**mount error: no mds server is up or the cluster is laggy**

This is likely a problem with your client file.

Sources

https://docs.ceph.com/en/quincy/install/get-packages/ https://knowledgebase.45drives.com/kb/creating-client-keyrings-for-cephfs/ https://docs.ceph.com/en/nautilus/cephfs/fstab/

3.1.3.3 - Gluster

Gluster is a distributed file system that supports both replicated and dispersed data.

Supporting dispersed data is a differentiating feature. Only a few can distribute the data in a erasure-coded or RAID-like fashion, making efficient use of space while providing redundancy. Have 5 cluster members? Just add one ‘parity bit’ for just a 20% overhead and you can loose a host. Add more parity if you like with the incremental cost. Other systems require you to duplicate your data for a 50% hit.

It’s also generally perceived as less complex than competitors like Ceph, as it has fewer moving parts and is focused on block storage. And since it uses native filesystems, you can always access your data directly. Redhat has ceased it’s corporate sponsorship, but the project is still quite active.

So you just need file storage and you have a lot of data, use gluster.

3.1.3.3.1 - Gluster on XCP-NG

Let’s set up a distributed and dispersed example cluster. We’ll XCP-NG for this. This is similar to an erasure-coded ceph pool.

Preparation

We use three hosts, each connected to a common network. With three we can disperse data enough to take one host at a time out of service. We use 4 disks on each host in this example but any number will work as long as they are all the same.

Network

Hostname Resolution

Gluster requires¹ the hosts be resolvable by hostname. Verify all the hosts can ping each other by name. You may want to create a hosts file and copy to all three to help.

If you have free ports on each server, consider using the second interface for storage, or a mesh network for better performance.

# Normal management and or guest network
192.168.1.1 xcp-ng-01.lan
192.168.1.2 xcp-ng-02.lan
192.168.1.3 xcp-ng-03.lan

# Storage network in a different subnet (if you have a second interface)
192.168.10.1 xcp-ng-01.storage.lan
192.168.10.2 xcp-ng-02.storage.lan
192.168.10.3 xcp-ng-03.storage.lan

Firewall Rules

Gluster requires a few rules; one for the daemon itself and one per ‘brick’ (drive) on the server. You can also just allow the cluster members cart-blanc access. We’ll do both examples here. Add these to all cluster members.

vi /etc/sysconfig/iptables

# Note that the last line in the existing file is a REJECT. Make sure to insert these new rules BEFORE that line.
-A RH-Firewall-1-INPUT -p tcp -s xcp-ng-01.storage.lan -j ACCEPT 
-A RH-Firewall-1-INPUT -p tcp -s xcp-ng-02.storage.lan -j ACCEPT 
-A RH-Firewall-1-INPUT -p tcp -s xcp-ng-03.storage.lan -j ACCEPT 

# Possibly for clients
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -s client-01.storage.lan -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -s client-01.storage.lan -j ACCEPT

service iptables restart

vi /etc/sysconfig/iptables

# The gluster daemon needs ports 24007 and 24008
# Individual bricks need ports starting at 49152. Add an additional port per brick.
# Here we have 49152-49155 open for 4 brickes. 

# TODO - test this command
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-01.storage.lan --dport 24007:24008 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-01.storage.lan --dport 49152:49155 -j ACCEPT

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-02.storage.lan --dport 24007:24008 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-02.storage.lan --dport 49152:49155 -j ACCEPT

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-03.storage.lan --dport 24007:24008 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-03.storage.lan --dport 49152:49155 -j ACCEPT

Disk

Gluster works with filesystems. This is convenient because if all else fails, you still have files on disks you can access. XFS is well regarded with gluster admins, so we’ll use that.

# Install the xfs programs
yum install -y xfsprogs

# Wipe the disks before using, then format the whole disk. Repeat for each disk
wipefs -a /dev/sda
mkfs.xfs /dev/sda

Let’s mount those disks. The convention² is to put them in /data organized by volume. We’ll use ‘volume01’ later in the config, so lets use that here as well.

On each server

# For 4 disks - Note, gluster likes to call them 'bricks'
mkdir -p /data/glusterfs/volume01/brick0{1,2,3,4}
mount /dev/sda /data/glusterfs/volume01/brick01
mount /dev/sdb /data/glusterfs/volume01/brick02
mount /dev/sdc /data/glusterfs/volume01/brick03
mount /dev/sdd /data/glusterfs/volume01/brick04

Add the appropriate config to your /etc/fstab so they mount at boot

Installation

A Note About Versions

XCP-NG is CentOS 7 based and provides GlusterFS v8 in their Repo. This version went EOL in 2021. You can add the CentOS Storage Special Interest group repo to get to v9, but no current version can be installed.

# Not recommended
yum install centos-release-gluster  --enablerepo=epel,base,updates,extras

# On each host

yum install -y glusterfs-server
systemctl enable --now glusterd


# On the first host

gluster peer probe xcp-ng-02.storage.lan
gluster peer probe xcp-ng-03.storage.lan

gluster pool list

UUID                                  Hostname              State
a103d6a5-367b-4807-be93-497b06cf1614  xcp-ng-02.storage.lan Connected 
10bc7918-364d-4e4d-aa16-85c1c879963a  xcp-ng-03.storage.lan Connected 
d00ea7e3-ed94-49ed-b56d-e9ca4327cb82  localhost             Connected

# Note - localhost will always show up for the host you're running the command on

Configuration

Gluster talks about data as being distributed and dispersed.

Distributed

# Distribute data amongst 3 servers, each with a single brick
gluster volume create MyVolume server1:/brick1 server2:brick1 server3:brick1

Any time you have more that one drive, it’s distributed. That can be across different disks on the same host, or across different hosts. There is no redundancy, however, and any loss of disk is loss of data.

Disperse

# Disperse data amongst 3 bricks, each on a different server
gluster volume create MyVolume disperse server1:/brick1 server2:/brick1 server3:/brick1

Dispersed is how you build redundancy across servers. Any one of these servers or bricks can fail and the data is safe.

# Disperse data amongst 6 six bricks, but some on the same server. Problem!
gluster volume create MyVolume disperse \
  server1:/brick1 server2:/brick1 server3:/brick1
  server1:/brick2 server2:/brick2 server3:/brick2

If you try and disperse your data across multiple bricks on the same server, you’ll run into the problem of sub-optimal parity. You’ll see the error message:

Multiple bricks of a disperse volume are present on the same server. This setup is not >optimal. Bricks should be on different nodes to have best fault tolerant configuration

Distributed-Disperse

# Disperse data into 3 brick subvolumes before distributing
gluster volume create MyVolume disperse 3 \
  server1:/brick1 server2:/brick1 server3:/brick1
  server1:/brick2 server2:/brick2 server3:/brick2

By specifying disperse COUNT you tell gluster that you want to create a subvolumes every COUNT bricks. In the above example, it’s every three bricks, so two subvolumes get created from the six bricks. This ensures the parity is optimally handled as it’s distributed.

You can also take advantage of bash shell expansion like below. Each subvolume is one line, repeated for each of the 4 bricks it will be distributed across.

gluster volume create volume01 disperse 3 \
  xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick01/brick \
  xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick02/brick \
  xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick03/brick \
  xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick04/brick

Operation

Mounting and Optimizing Volumes

mount -t glusterfs xcp-ng-01.storage.lan:/volume01 /mnt

gluster volume set volume01 group metadata-cache

gluster volume set volume01 performance.readdir-ahead on 
gluster volume set volume01 performance.parallel-readdir on

gluster volume set volume01 group nl-cache 
gluster volume set volume01 nl-cache-positive-entry on

Adding to XCP-NG

mount -t glusterfs xcp-ng-01.lan:/volume01/media.2 /root/mnt2/
mkdir mnt2/xcp-ng

xe sr-create content-type=user type=glusterfs name-label=GlusterSharedStorage shared=true \
  device-config:server=xcp-ng-01.lan:/volume01/xcp-ng \
  device-config:backupservers=xcp-ng-02.lan:xcp-ng-03.lan

Scrub and Bitrot

Scrub is off by default. You can enable scrub at which point the scrub daemon will begin “signing” files³ (by calculating checksum). The file-system parity isn’t used. So if you enable and immediately begin a scrub you will see many “Skipped files” as their checksum hasn’t been calculated yet.

Client Installation

The FUSE client is recommended⁴. The docs cover a .deb based install, but you can also install from the repo. On Debian:

sudo apt install lsb-release gnupg

OS=$(lsb_release --codename --short)

# Assuming the current version of gluster is 11 
wget -O - https://download.gluster.org/pub/gluster/glusterfs/11/rsa.pub | sudo pt-key add -
echo deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/11/LATEST/Debian/${OS}/amd64/apt ${OS} main | sudo tee /etc/apt/sources.list.d/gluster.list
sudo apt update; sudo apt install glusterfs-client

You need quite a few options to use this successfully at boot in the fstab

192.168.3.235:/volume01 /mnt glusterfs nofail,x-systemd.automount,x-systemd.requires=network-online.target,x-systemd.device-timeout=10 0 0

How to reboot a node

You may find that your filesystem has paused during a reboot. Take a look at your network timeout and see if setting it lower helps.

https://unix.stackexchange.com/questions/452939/glusterfs-graceful-reboot-of-brick

gluster volume set volume01 network.ping-timeout 5

Using notes from https://www.youtube.com/watch?v=TByeZBT4hfQ

3.1.3.4 - Hashdeep

Hashdeep is a reasonable way to generate checksums and test integrity on file trees.

On the source side, navigate to the parent directory of the one you want to hash and use the relative paths flag so as to avoid any issue with absolute path differences between the source and destination folders.

cd /srv/some/Parent/Directory
hashdeep -j 1 -c md5 -r -l > ~/hashes DirectoryYouWantToHash

The flags we are using are:

-j 1     # Threads to use. Adjust down if you have slow IO
-c md5   # Algorythem. md5 is the fastet in testing
-r       # Recursive
-l       # Use relative paths so it's more portable

Copy the hashes text file to the destination side and repeat.

cd /srv/some/Possibly/Different/Parent/Directory
hashdeep -r -a -v -l -k ~/hashes DirectoryToCheck

The flags we are using are:

-r # Recursive, as before
-a # Archive checking mode
-v # Verbose summary
-l # Relative as before
-k # File that has the sums to use for checking

3.1.3.5 - NFS

NFS is the fastest way move files around a small network. It beats both samba and afp in throughput (Circa 2014) in my testing and with a little extra config works well between apple and linux.

3.1.3.5.1 - General Use

The NFS server supports multiple protocol versions, but we’ll focus on the current 4.X version of the protocol. It’s been out since 2010 and simplifies security.

Installation

Linux Server

This will install the server and a few requisites.

sudo  apt-get install nfs-kernel-server

Configuration

Set NFSv4 only

In order to streamline the ports needed (in case one uses firewalls) and reduce required services, we will limit the server to v4¹ only.

Edit nfs-common

sudo vi /etc/default/nfs-common

NEED_STATD=“no” NEED_IDMAPD=“yes”

And the defaults

sudo vi /etc/default/nfs-kernel-server

RPCNFSDOPTS="-N 2 -N 3" RPCMOUNTDOPTS="–manage-gids -N 2 -N 3"

Disable rpcbind

sudo systemctl mask rpcbind.service sudo systemctl mask rpcbind.socket

Create Exports

In NFS parlance, you ’export’ a folder when you share it. We’ll use the same location for our exports as suggested in the Debian example.

sudo vim /etc/exports

/srv/nfs4 192.168.1.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,all_squash,anonuid=1000,anongid=1000,insecure)

         /srv/nfs4 # This is the actual folder on the server's file system you're sharing
    192.168.1.0/24 # This is the network you're sharing with
                rw # Read-Write mode
             async # Allow cached writes
            fsid=0 # This signifies this is the 'root' of the exported file system and that
                   # clients will mount this share as '/'
          crossmnt # Allow subfolders that are seperate filesystem to be accessed also
  no_subtree_check # Disable checking for access rights outside the exported file system
        all_squash # all user IDs will translated to anonymous
      anonuid=1000 # all anonymous connections will be mapped to this user account in /etc/passwd
      anongid=1000 # all anonymous connections will be mapped to this group account in /etc/passwd
          insecure # Allows macs to mount using source ports from non-root source ports

If you can’t put all your content under this folder, it’s recommended you create pseudo file system for security reasons. See the notes for more info on that, but keep things simple if you can.

Configure Host-Based Firewall

If you have a system with ufw you can get this working fairly easily. NFS is already defined as a well-known service.

ufw allow from 192.168.1.0/24 to any port nfs

Restart the Service

You don’t actually need to restart. You put your changed into effect by issuing the exportfs command. This is best practice so you don’t to disrupt currently connected clients.

exportfs -rav

Client Configuration

Apple OS X

Modern Macs support NFSv4 with a couple tweaks

# In a terminal, issue the command
sudo mount -t nfs -o nolocks,resvport,locallocks 192.168.1.2:/srv ./mnt

You can also mount in finder with a version 4 flag. That may help but is somewhat awkward

nfs://vers=4,192.168.1.5/srv/nfs4

You can also edit the mac’s config file. This will allow you to use the finder to mount NFS 4 exports.

sudo vim /etc/nfs.conf

#
# nfs.conf: the NFS configuration file
#
#nfs.client.mount.options = nolock
#
nfs.client.mount.options = vers=4.1,nolocks,resvport,locallocks

You can now hit command-k and enter the string below to connect

nfs://my.server.or.ip/

Some sources suggest editing the autofs.conf file to add ’nolocks,locallocks to the automount options. This may or may not have an effect.

sudo vim  /etc/autofs.conf
AUTOMOUNTD_MNTOPTS=nosuid,nodev,nolocks,locallocks

Troubleshooting

Must use v3

If you must use v3, you can set static ports. Use the internet for this.

lockd: cannot monitor

You may want to check your mac’s nfs options and set ’nolock’ or possibly ‘vers=4’ as above. Don’t set them both on at once as in the next issue.

mount_nfs: can’t mount / from home onto /Volumes/mnt: Invalid argument

You can’t combine -o vers=4 with options like ’nolocks’, presumably because it’s not implemented fully. This may have changed by now.

https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man8/mount_nfs.8.html

No Such File or Directory mount.nfs: mounting some.ip:/srv failed, reason given by server: No such file or directory

Version 4 maps directories and starts with ‘/’. Try mounting just the root path as opposed to /srv/nfs4.

mount  -o nfsvers=4.1 some.ip:/ /srv

<There was a problem ….

Check that you have ‘insecure’ in your nfs export options on the server

/srv  192.168.1.0/24(rw,async,fsid=0,insecure,crossmnt,no_subtree_check)

Can’t create or see files

Don’t forget that file permissions apply as the user you specified above. Set chown and chmod accordingly

Can Create Files But Not Modify or Delete

Check the parent directory permissions

NFS doesn’t mount at boot

Try adding some mount [options].

some.ip:/ /srv  nfs nofail,x-systemd.automount,x-systemd.requires=network-online.target,x-systemd.device-timeout=10,vers=4.1 0 0

mount.nfs: requested NFS version or transport protocol is not supported

Try specifying the nfs version

mount  -o nfsvers=4.1 some.ip:/ /srv

https://wiki.debian.org/NFSServerSetup [options]:https://unix.stackexchange.com/questions/459731/systemd-failed-to-mount-nfs-share-mount-nfs-network-is-unreachable-until-late ↩︎

3.1.3.5.2 - Armbian NFS Server

This is usually a question of overhead. NFS has less CPU overhead and faster speeds circa 2023, and anecdotal testing showed fewer issues with common clients like VLC, Infuse and Kodi. However, there’s no advertisement¹ like SMB has, so you have to pre-configure all clients.

This is the basic config for an anonymous, read-only share.

apt install nfs-kernel-server

echo "/mnt/pool *(fsid=0,ro,all_squash,no_subtree_check)" >> /etc/exports

exportfs -rav

mDNS SRV records has some quasi supports, but not with common clients ↩︎

3.1.3.5.3 - NFS Container

This is problematic. NFS requires kernel privileges so the usual answer is “don’t”. Client’s also. So from a security and config standpoint, it’s better have PVE act as the NFS client and use bind mounts for the containers. But this can blur the line between services and infrastructure.

Either way, here’s my notes from setting up an Alpine NFS server.

Create privileged container and enable nesting

https://forum.proxmox.com/threads/is-it-possible-to-run-a-nfs-server-within-a-lxc.24403/page-2

Create a privileged container by unchecking “Unprivileged” during creation. May be possible to convert an existing container from unprivileged to privileged by backing-up and restoring. In the container Options -> Features, enable Nesting. (The NFS feature doesn’t seem necessary for running an NFS server. May be required for an NFS client - I haven’t checked

For Alpine, CAP_SETPCAP is also needed

vi /etc/pve/lxc/100.conf

# clear cap.drop
lxc.cap.drop:

# copy drop list from /usr/share/lxc/config/common.conf
lxc.cap.drop = mac_admin mac_override sys_time sys_module sys_rawio

# copy drop list from /usr/share/lxc/config/alpine.common.conf with setpcap commented

lxc.cap.drop = audit_write
lxc.cap.drop = ipc_owner
lxc.cap.drop = mknod
# lxc.cap.drop = setpcap
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_ptrace
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_resource
lxc.cap.drop = sys_tty_config
lxc.cap.drop = syslog
lxc.cap.drop = wake_alarm

Then proceed with https://wiki.alpinelinux.org/wiki/Setting_up_a_nfs-server.

3.1.3.6 - sshfs

You can mount a remote system via sshfs. It’s slow, but better than nothing.

# Mount a host dir you have ssh access to
sshfs [email protected]:/var/www/html /var/www/html

# Mount a remote system over a proxy jump host
sshfs [email protected]:/var/www/html /var/www/html -o ssh_command="ssh -J [email protected]",allow_other,default_permissions

It’s often handy to add a shortcut to your ssh config so you don’t have to type as much.

cat .ssh/config 
Host some.host
    ProxyJump some.in.between.host

3.1.3.7 - ZFS

Overview

ZFS: the last word in file systems - at least according to Sun Microsystems back in 2004. But it’s pretty much true for traditional file servers. You add disks to a pool where you decide how much redundancy you want, then create file systems that sit on top. ZFS directly manages it all. No labeling, partitioning or formatting required.

There is error detection and correction, zero-space clones and snapshots, compression and deduplication, and delta-based replication for backup. It’s also highly resistent to corruption from power loss and crashes because it uses Copy-On-Write.

This last feature means that as files are changed, only the changed-bits are written out, and then the metadata updated at the end as a separate and final step to include these changed bits. The original file stays the same until the very end. An interruption in the middle of a write (such as from a crash) leaves the file undamaged.

Lastly, everything is checksummed and automatically repaired should you ever suffer from silent corruption.

3.1.3.7.1 - Basics

The Basics

Let’s create a pool and mount a file system.

Create a Pool

A ‘pool’ is group of disks and that’s where RAID levels are established. You can choose to mirror drives, or use distributed parity. A common choice is RAIDZ1 so that you can sustain the loss of one drive. You can increase the redundancy to RAIDZ2 and RAIDZ3 as desired.

zpool create pool01 raidz1 /dev/sdc /dev/sdd /dev/sde /dev/sdf

zpool list
NAME               SIZE  ALLOC   FREE  
pool01              40T     0T    40T

Create a Filesystem

A default root file system is created and mounted automatically when the pool is created. Using this is fine and you can easily change where it’s mounted.

zfs list

NAME     USED  AVAIL     REFER  MOUNTPOINT
pool01     0T    40T        0T  /pool01

zfs set mountpoint=/mnt pool01

But often, you need more than one file system - say one for holding working files and another for long term storage. This allows you to easily back up things different things, differently.

# Get rid of the initial fs and pool
zpool destroy pool01

# Create it again but leave the root filesystem unmounted with the `-m` option. You can also use drive short-names
zpool create -m none pool01 raidz1 sdc sdd sde sdf

# Add a couple filesystems and mount under the /srv directory
zfs create pool01/working -o  mountpoint=/srv/working
zfs create pool01/archive -o  mountpoint=/srv/archive             
              ^     ^
             /       \
pool name --          filesystem name

Now you can do things like snapshot the archive folder regularly while skipping the working folder. The only downside is that they are separate filessytems. Moving things doesn’t happen instantly.

Compression

Compression is on by default and this will save space for things that can benefit from it. It also makes things faster as moving compressed data takes less time. CPU use for the default algorithm lz4, is negligible and it quickly detects files that aren’t compressible and gives up so that CPU time isn’t wasted.

zfs get compression pool01

NAME    PROPERTY     VALUE           SOURCE
pool01  compression  lz4             local

Next Step

Now you’ve got a filesystem, take a look at creating and work with snapshots.

3.1.3.7.2 - Snapshot

Create a Snapshot

About to do something dangerous? Let’s create a ‘save’ point so you can reload your game, so to speak. They don’t take any space (to start with) and are nearly instant.

# Create a snapshot named `save-1`
zfs snapshot pool01/archive@save-1

The snapshot is a read-only copy of the filesystem at that time. It’s mounted by default in a hidden directory and you can examine and even copy things out of it, if you desire.

ls /srv/archive/.zfs/save-1

Delete a Snapshot

While a snapshot doesn’t take up any space to start with, it begins to as you make changes. Anything you delete stays around on the snapshot. Things you edit consume space as well for the changed bits. So when you’re done with a snapshot, it’s easy to remove.

zfs snapshot destroy pool01/archive@save-1

Rollback to a Snapshot

Mess things up in your archive folder after all? Not a problem, just roll back to the same state as your snapshot.

zfs rollback pool01/archive@save-1

Importantly, this is a one-way trip back. You can’t branch and jump around like it was a filesystem multiverse of alternate possibilities. ZFS will warn you about this and if you have more than one snapshot in between you and where you’re going, it will let you know they are about to be deleted.

Auto Snapshot

One of the most useful tools is the zfs-auto-snapshot utility. This will create periodic snapshots of your filesystem and keep them pruned for efficiency. By default, it creates a snapshot every 15 min, and then prunes them down so you have one:

Every 15 min for an hour
Every hour for a day
Every day for a week
Every week for month
Every month

Install with the command:

sudo apt install zfs-auto-snapshot

That’s it. You’ll see new folders based on time created in the hidden .zfs folder at the root of your filesystems. Each filesystem will get it’s own. Anytime you need to look for a file you’ve deleted, you’ll find it there.

# Look at snapshots
ls /srv/archive/.zfs/

Excluding datasets from auto snapshot

# Disable
zfs set com.sun:auto-snapshot=false rpool/export

Excluding frequent or other auto-snapshot

There are sub-properties you can set under the basic auto-snapshot value

zfs set com.sun:auto-snapshot=true pool02/someDataSet
zfs set com.sun:auto-snapshot:frequent=false  pool02/someDataSet

zfs get com.sun:auto-snapshot pool02/someDataSet
zfs get com.sun:auto-snapshot:frequent pool02/someDataSet

# Possibly also the number to keep if other than the default is desired
zfs set com.sun:auto-snapshot:weekly=true,keep=52

# Take only weekly
zfs set com.sun:auto-snapshot:weekly=true rpool/export

Deleting Lots of auto-snapshot files

You can’t use globbing or similar to mass-delete snapshots, but you can string together a couple commands.

# Disable auto-snap as needed
zfs set com.sun:auto-snapshot=false pool04
zfs list -H -o name -t snapshot | grep auto | xargs -n1 zfs destroy

Missing Auto Snapshots

On some centOS-based systems, like XCP-NG, you will only see frequent snapshots. This is because only the frequent cron job uses the correct path. You must add a PATH statement to the other cron jobs

https://forum.level1techs.com/t/setting-up-zfs-auto-snapshot-on-centos-7/129574/12

Next Step

Now that you have snapshots, let’s send them somewhere for backup with replication.

References

https://www.reddit.com/r/zfs/comments/829v5a/zfs_ubuntu_1604_delete_snapshots_with_wildcard/

3.1.3.7.3 - Replication

Replication is how you backup and copy ZFS. It turns a snapshot into a bit-stream that you can pipe to something else. Usually, you pipe it over the network to another system where you connect it to zfs receive.

It is also the only way. The snapshot is what allows point-in-time handling and the receive ensures consistency. And a snapshot is a filesystem, but it’s more than just the files. Two identically named filesystems with the same files you put in place by rsync are not the same filesystem and you can’t jump-start a sync this way.

Basic Examples

# On the receiving side, create a pool to hold the filesystem
zpool create -m none pool02 raidz1 sdc sdd sde sdf

# On the sending side, pipe over SSH. The -F forces the filesystem on the receiving side to be replaced
zfs snapshot pool01@snap1
zfs send pool01@snap1 | ssh some.other.server zfs receive -F pool02

Importantly, this replaces the root filesystem on the receiving side. The filesystem you just copied over is accessible when the replication is finished - assuming it’s mounted and your only using the default root. If you’re using multiple filesystems, you’ll want to recursively send things so you can pick up children like the archive filesystem.

# The -r and -R trigger recursive operations
zfs snapshot -r pool01@snap1
zfs send -R pool01@snap1 | ssh some.other.server zfs receive -F pool02

You can also pick a specific filesystem to send. You can name it whatever you like on the other side, or replace something already named.

# Sending just the archive filesystem
zfs snapshot pool01/archive@snap1
zfs send pool01/archive@snap1 | ssh some.other.server zfs receive -F pool02/archive

And of course, you may have two pools on the same system. One line in a terminal is all you need.

zfs send -R pool01@snap1 | zfs receive -F pool02

Using Mbuffer or Netcat

These are much faster than ssh if you don’t care about someone capturing the traffic. But it does require you to start both ends separately.

# On the receiving side
ssh some.other.system
mbuffer -4 -s 128k -m 1G -I 8990 | zfs receive -F pool02/archive

# On the sending side
zfs send pool01/archive@snap1 | mbuffer -s 128k -m 1G -O some.other.system:8990

You can also use good-ole netcat. It’s a little slower but still faster than SSH. Combine it with pv for some visuals.

# On the receiving end
nc -l 8989 | pv -trab -B 500M | zfs recv -F pool02/archive

# On the sending side
zfs send pool01/archive@snap1 | nc some.other.system 8989

Estimating The Size

You may want to know how big the transmit is to estimate time or capacity. You can do this with a dry-run.

zfs send -nv pool01/archive@snap1

Use a Resumable Token

Any interruptions and you have to start all over again - or do you? If you’re sending a long-running transfer, add a token on the receiving side and you can restart from where it broke, turning a tragedy into just an annoyance.

# On the receiving side, add -s
ssh some.other.system
mbuffer -4 -s 128k -m 1G -I 8990 | zfs receive -s -F pool01/archive

# Send the stream normally
zfs send pool01/archive@snap1 | mbuffer -s 128k -m 1G -O some.other.system:8990

# If you get interrupted, on the receiving side, look up the token
zfs get -H -o value receive_resume_token pool01

# Then use that on the sending side to resume where you left off
zfs send -t 'some_long_key' | mbuffer -s 128k -m 1G -O some.other.system:8990

If you decide you don’t want to resume, clean up with the -A command to release the space consumed by the pending transfer.

# On the receiving side
zfs recv -A pool01/archive

Sending an Incremental Snapshot

After you’ve sent the initial snapshot, subsequent ones are much smaller. Even very large backups can be kept current up if you ‘pre-seed’ before taking the other pool remote.

# at some point in past
zfs snapshot pool01/archive@snap1
zfs send pool01/archive@snap1 ....

# now we'll snap again and send just the changes between the two using  -i 
zfs snapshot pool01/archive@snap2
zfs send -i pool01/archive/@snap1 pool01/archive@snap2 ...

Sending Intervening Snapshots

If you are jumping more than one snapshot ahead, the intervening ones are skipped. If you want to include them for some reason, use the -I option.

# This will send all the snaps between 1 and 9
zfs send -I pool01/archive/@snap1 pool01/archive@snap9 ...

Changes Are Always Detected

You’ll often need to use -F to force changes as even though you haven’t used the remote system, it may think you have if it’s mounted and atimes are on.

You must have a snapshot in common

You need at least 1 snapshot in common at both locations. This must have been sent from one to the other, not just named the same. Say you create snap1 and send it. Later, you create snap2 and, thinking you don’t need it anymore, delete snap1. Even though you have snap1 on the destination you cannot send snap2 as a delta. You need snap1 in both locations to generate the delta. You are out of luck and must send snap2 as a full snapshot.

You can use a zfs feature called a bookmark as an alternative, but that is something you set up in advance and won’t save you from the above scenario.

A Full Example of an Incremental Send

Take a snapshot, estimate the size with a dry-run, then use a resumable token and force changes

zfs snapshot pool01/archive@12th
zfs send -nv -i pool01/archive@11th pool01/archive@12th
zfs send -i pool01/archive@11th pool01/archive@12th | pv -trab -B 500M | ssh some.other.server zfs recv -F -s pool01/archive

Here’s an example of a recursive snapshot to a file. The snapshot takes a -r for recursive, and the send a -R.

# Take the snapshot
zfs snapshot -r pool01/archive@21

# Estimate the size
zfs send -nv -R -i pool01/archive@20 pool01/archive@21

# Mount a portable drive to /mnt - a traditional ext4 or whatever
zfs send -Ri pool01/archive@20 pool01/archive@21 > /mnt/backupfile

# When you get to the other location, receive from the file
zfs receive -s -F pool02/archive < /mnt/backupfile

Sending The Whole Pool

This is a common thought when you’re starting, but it ends up being deceptive because pools aren’t things that can be sent. Only filesets. So what you’re actually doing is a recursive send of the root fileset with a implicit snapshot created on the fly. This is find, but you won’t be able to refer to it later for updates, so you’re better off not.

# Unmount -a (all available filesystems) on the given pool
zfs unmount -a pool01

# Send the unmounted filesystem with an implicit snapshot
zfs send -R pool01 ...

Auto Replication

You don’t want to do this by hand all the time. One way is with a simple script. If you’ve already installed zfs-auto-snapshot you may have something that looks like this:

# use '-o name' to get just the snapshot name without all the details 
# use '-s creation' to sort by creation time
zfs list -t snapshot -o name -s creation pool01/archive

pool01/archive@auto-2024-10-13_00-00
pool01/archive@auto-2024-10-20_00-00
pool01/archive@auto-2024-10-27_00-00
pool01/archive@auto-2024-11-01_00-00
pool01/archive@auto-2024-11-02_00-00

You can get the last two like this, then use send and receive. Adjust the grep to get just the daily as needed.

CURRENT=$(zfs list -t snapshot -o name -s creation pool01/replication | grep auto | tail -1 )
LAST=$(zfs list -t snapshot -o name -s creation pool01/replication | grep auto | tail -2 | head -1)

zfs send -i $LAST $CURRENT | pv -trab -B 500M | ssh some.other.server zfs recv -F -s pool01/archive

This is pretty basic and can fall out of sync. You can bring it up a notch by asking the other side to list it’s snapshots with a zfs list over ssh and comparing against yours to find the most recent match. And add a resume token. But by that point you may consider just using a tool like:

Zrepl

This looks to replace your auto snapshots as well, and that’s probably fine. I have’t used it myself as I scripted back in the day. I will probably start, though.

Next Step

Sometimes, good disks go bad. Learn how to catch them before they do, and replace them when needed.

Scrub and Replacement

3.1.3.7.4 - Disk Replacement

Your disks will fail, but you’ll usually get some warnings because ZFS proactively checks every occupied bit to guard against silent corruption. This is normally done every month, but you can launch one manually if you’re suspicious. They take a long time, but operate at low priority.

# Start a scrub
zpool scrub pool01

# Check the status
zpool scrub -s pool01

You can check the status of your pool at anytime with the command zpool status. When there’s a problem, you’ll see this:

zpool status

NAME            STATE     READ WRITE CKSUM
pool01          DEGRADED     0     0     0
  raidz1        DEGRADED     0     0     0
    /dev/sda    ONLINE       0     0     0
    /dev/sdb    ONLINE       0     0     0
    /dev/sdc    ONLINE       0     0     0
    /dev/sdd    FAULTED     53     0     0  too many errors

Time to replace that last drive before it goes all the way bad

# You don't need to manually offline the drive if it's faulted, but it's good practice to as there's other states it can be in
zpool offline pool01 /dev/sdd

# Physically replace that drive. If you're shutting down to do this, the replacement usually has the same device path
zpool replace pool01 /dev/sdd

There’s a lot of strange things that can happen with drives and depending on your version of ZFS it might be using UUIDs or other drive identification strings. Check the link below for some of those conditions.

zpool

3.1.3.7.5 - Large Pools

You’ve guarded against disk failure this by adding redundancy, but was it enough? There’s a very mathy calculator at https://jro.io/r2c2/ that will allow you chart different parity configs. But a reasonable rule-of-thumb is to devote 20%, or 1 in 5 drives, to parity.

RAIDZ1 - up to 5 Drives
RAIDZ2 - up to 10 Drives
RAIDZ3 - up to 15 Drives

Oracle however, recommends Virtual Devices when you go past 9 disks.

Pools and Virtual Devices

When you get past 15 drives, you can’t increase parity. You can however, create virtual devices. Best practice from Oracle says to do this even earlier as a VDev should be less than 9 disks ¹. So given 24 disks, you should have 3 VDevs of 8 each. Here’s an example with 2 parity. Slightly better than 1 in 5 and suitable for older disks.

### Build a 3-Wide RAIDZ2  across 24 disks
zpool create \
  pool01 \
  -m none \
  -f \
  raidz2 sdb sdc sdd sde sdf sdg sdh sdi \
  raidz2 sdj sdk sdl sdm sdn sdo sdp sdq \
  raidz2 sdr sds sdt sdu sdv sdw sdx sdy

Using Disk IDs

Drive letters can be hard to trace back to a physical drive. A a better² way is the /dev/disk/by-id identifiers.

ls /dev/disk/by-id | grep ata | grep -v part

zpool create -m none -o ashift=12 -O compression=lz4 \
  pool04 \
    raidz2 \
      ata-ST4000NM0035-1V4107_ZC11AHH9 \
      ata-ST4000NM0035-1V4107_ZC116F11 \
      ata-ST4000NM0035-1V4107_ZC1195V5 \
      ata-ST4000NM0035-1V4107_ZC11CDMB \
      ata-ST4000NM0035-1V4107_ZC1195PR \
      ata-ST4000NM0024-1HT178_Z4F164WG \
      ata-ST4000NM0024-1HT178_Z4F17SJK \
      ata-ST4000NM0024-1HT178_Z4F17M6B \
      ata-ST4000NM0024-1HT178_Z4F18FZE \
      ata-ST4000NM0024-1HT178_Z4F18G35

Hot and Distributed Spares

Spares vs Parity

You may not be reach a location quickly when a disk fails. In such a case, is it better to have a Z3 filesystem run in degraded performance mode (i.e. calculating parity the whole time) or a Z2 system that replaces the failed disk automatically?

It’s better to have a more parity until you go past the guidelines of 15 drives in a Z3 config. If you have 16 bays, add a hot spare.

Distributed vs Dedicated

A distributed spare is a newer feature that allows you reserve space on all of your disks, rather than just one. That allows resilvering to go much faster as you’re no longer limited by the speed of one disk. Here’s an example of such a pool that has 16 total devices.

# This pool has 3 parity, 12 data, 16 total count, with 1 spare
```bash
zpool create -f pool02 \
  draid3:12d:16c:1s \
    ata-ST4000NM000A-2HZ100_WJG04M27 \
    ata-ST4000NM000A-2HZ100_WJG09BH7 \
    ata-ST4000NM000A-2HZ100_WJG0QJ7X \
    ata-ST4000NM000A-2HZ100_WS20ECCD \
    ata-ST4000NM000A-2HZ100_WS20ECFH \
    ata-ST4000NM000A-2HZ100_WS20JXTA \
    ata-ST4000NM0024-1HT178_Z4F14K76 \
    ata-ST4000NM0024-1HT178_Z4F17SJK \
    ata-ST4000NM0024-1HT178_Z4F17YBP \
    ata-ST4000NM0024-1HT178_Z4F1BJR1 \
    ata-ST4000NM002A-2HZ101_WJG0GBXB \
    ata-ST4000NM002A-2HZ101_WJG11NGC \
    ata-ST4000NM0035-1V4107_ZC1168N3 \
    ata-ST4000NM0035-1V4107_ZC116F11 \
    ata-ST4000NM0035-1V4107_ZC116MSW \
    ata-ST4000NM0035-1V4107_ZC116NZM \

References

3.1.3.7.6 - Pool Testing

Best Practices

Best practice from Oracle says a VDev should be less than 9 disks¹. So given 24 disks, you should have 3 VDevs. They further recommend the following amount of parity vs data:

single-parity starting at 3 disks (2+1)
double-parity starting at 6 disks (4+2)
triple-parity starting at 9 disks (6+3)

It is not recommended to create a zpool with a single large vdev, say 20 disks, because write IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).

Reasons For These Practices

I interpret this as meaning that when a single IO write operation is given to the VDev, it won’t write anything else until it’s done. But if you have multiple VDevs, you can hand out a writes to other VDevs while you’re waiting on the first. Reading is probably unaffected, but writes will be faster with more VDevs.

Also, when resilvering the array, you have to read from each of the drives in the VDev to calculate the parity bit. If you have 24 drives in a VDev, then you have to read a block of data from all 24 drives to produce the parity bit. If you have only 8, then you have only 1/3 as much data to read. Meanwhile, the rest of the VDevs are available for real work.

Rebuilding the array also introduces stress which can cause other disks to fail, so it’s best to limit that to a smaller set of drives. I’ve heard many times of resilvering causing sister drives that were already on the edge, to go over and fail the array.

Calculating Failure Rates

You can calculate the failure rates of different configurations with an on-line tool². The chart scales the X axis by 50, so the differences in failure rates are not as large as it would seem, but if they didn’t you wouldn’t be able to see the lines. But in most cases, there’s not a large difference say between a 4x9 and a 3x12.

When To Use a Hot Spare

Given 9 disks where one fails, is it better to drop from 3 parity to 2 and run in degraded mode, or 2 parity that drops to 1 and a spare that recovers without intervention. The math² says its better to have parity. But what about speed? When you loose a disk, 1 out of every 9 IOPS requires that you reconstruct it from parity. Anecdotally, observed performance penalties are minor. So the only times to use a hot spare is:

When you have unused capacity in RAIDZ3 (i.e. almost never)
When IOPS require a mirror pool

Say you have 16 bays of 4TB Drives. A 2x8 Z2 config gives you 48TB but you only want 32TB. Change that to a 2x8 Z3 and get 40TB. Still only need 32 TB? Change that to a 2x7 Z3 with 2 hot spares. Now you have 32TB with the maximum protection and the insurance of an automatic replacement.

Or maybe you have a 37 bay system. You do something that equals 36 plus a spare.

The other case is when your IOPS demands push past what RAIDZ can do an you must use a mirror pool. A failure there looses all redundancy and a hot spare is your only option.

When To Use a Distributed Spare

A distributed spare recovers in half the time³ from a disk loss, and is always better than a dedicated spare - though you should almost never use a spare anyway. The only time to use a normal hot spare is when you have a single global spare.

Testing Speed

The speed difference isn’t charted. So let’s test that some.

Given 24 disks, and deciding to live dangerously, should you should have a single, 24 disk vdev with three parity disks, or three VDevs with a single parity disk each? The reason for the 1st case is better resiliency, and the latter better write speed and recovery from disk failures.

Build a 3-Wide RAIDZ1

Create the pool across 24 disks

zpool create \
-f -m /srv srv \
raidz sdb sdc sdd sde sdf sdg sdh sdi \
raidz sdj sdk sdl sdm sdn sdo sdp sdq \
raidz sdr sds sdt sdu sdv sdw sdx sdy

Now copy a lot of random data to it

#!/bin/bash

no_of_files=1000
counter=0
while [[ $counter -le $no_of_files ]]
 do echo Creating file no $counter
   touch random-file.$counter
   shred -n 1 -s 1G random-file.$counter
   let "counter += 1"
 done

Now yank (literally) one of the physical disks and replace it

allen@server:~$ sudo zpool status  
                                                                             
  pool: srv                                                                                              
 state: DEGRADED                                                                                         
status: One or more devices could not be used because the label is missing or                            
        invalid.  Sufficient replicas exist for the pool to continue                                     
        functioning in a degraded state.                                                                 
action: Replace the device using 'zpool replace'.                                                        
   see: http://zfsonlinux.org/msg/ZFS-8000-4J                                                            
  scan: none requested                                                                                   
config:                                                                                                  
                                                                                                         
        NAME                     STATE     READ WRITE CKSUM                                              
        srv                      DEGRADED     0     0     0                                              
          raidz1-0               DEGRADED     0     0     0                                              
            sdb                  ONLINE       0     0     0                                              
            6847353731192779603  UNAVAIL      0     0     0  was /dev/sdc1                               
            sdd                  ONLINE       0     0     0                                              
            sde                  ONLINE       0     0     0                                              
            sdf                  ONLINE       0     0     0                                              
            sdg                  ONLINE       0     0     0                                              
            sdh                  ONLINE       0     0     0                                              
            sdi                  ONLINE       0     0     0                                              
          raidz1-1               ONLINE       0     0     0                                              
            sdr                  ONLINE       0     0     0                                              
            sds                  ONLINE       0     0     0                                              
            sdt                  ONLINE       0     0     0                                              
            sdu                  ONLINE       0     0     0                                              
            sdj                  ONLINE       0     0     0                                              
            sdk                  ONLINE       0     0     0                                              
            sdl                  ONLINE       0     0     0                                              
            sdm                  ONLINE       0     0     0  
          raidz1-2               ONLINE       0     0     0  
            sdv                  ONLINE       0     0     0  
            sdw                  ONLINE       0     0     0  
            sdx                  ONLINE       0     0     0  
            sdy                  ONLINE       0     0     0  
            sdn                  ONLINE       0     0     0  
            sdo                  ONLINE       0     0     0  
            sdp                  ONLINE       0     0     0  
            sdq                  ONLINE       0     0     0             
                                                                           
errors: No known data errors

allen@server:~$ lsblk                                                                                    
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT                                                              
sda      8:0    0 465.8G  0 disk                                                                         
├─sda1   8:1    0 449.9G  0 part /                                                                       
├─sda2   8:2    0     1K  0 part                                                                         
└─sda5   8:5    0  15.9G  0 part [SWAP]                                                                  
sdb      8:16   1 931.5G  0 disk                                                                         
├─sdb1   8:17   1 931.5G  0 part                                                                         
└─sdb9   8:25   1     8M  0 part                                                                         
sdc      8:32   1 931.5G  0 disk                                                                         
sdd      8:48   1 931.5G  0 disk                                                                         
├─sdd1   8:49   1 931.5G  0 part                                                                         
└─sdd9   8:57   1     8M  0 part    
...


sudo zpool replace srv 6847353731192779603 /dev/sdc -f

allen@server:~$ sudo zpool status
  pool: srv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Mar 22 15:50:21 2019
    131G scanned out of 13.5T at 941M/s, 4h7m to go
    5.40G resilvered, 0.95% done
config:

        NAME                       STATE     READ WRITE CKSUM
        srv                        DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            sdb                    ONLINE       0     0     0
            replacing-1            OFFLINE      0     0     0
              6847353731192779603  OFFLINE      0     0     0  was /dev/sdc1/old
              sdc                  ONLINE       0     0     0  (resilvering)
            sdd                    ONLINE       0     0     0
            sde                    ONLINE       0     0     0
            sdf                    ONLINE       0     0     0
...

A few hours later…

$ sudo zpool status


  pool: srv
 state: DEGRADED
status: One or more devices has experienced an error resulting in data corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 571G in 5h16m with 2946 errors on Fri Mar 22 21:06:48 2019
config:

    NAME                       STATE     READ WRITE CKSUM
    srv                        DEGRADED   208     0 2.67K
        raidz1-0               DEGRADED   208     0 5.16K
        sdb                    ONLINE       0     0     0
        replacing-1            OFFLINE      0     0     0
          6847353731192779603  OFFLINE      0     0     0  was /dev/sdc1/old
        sdc                    ONLINE       0     0     0
        sdd                    ONLINE     208     0     1
        sde                    ONLINE       0     0     0
        sdf                    ONLINE       0     0     0
        sdg                    ONLINE       0     0     0
        sdh                    ONLINE       0     0     0
        sdi                    ONLINE       0     0     0
      raidz1-1                 ONLINE       0     0     0
        sdr                    ONLINE       0     0     0
        sds                    ONLINE       0     0     0
        sdt                    ONLINE       0     0     0
        sdu                    ONLINE       0     0     0
        sdj                    ONLINE       0     0     1
        sdk                    ONLINE       0     0     1
        sdl                    ONLINE       0     0     0
        sdm                    ONLINE       0     0     0
      raidz1-2                 ONLINE       0     0     0
        sdv                    ONLINE       0     0     0
        sdw                    ONLINE       0     0     0
        sdx                    ONLINE       0     0     0
        sdy                    ONLINE       0     0     0
        sdn                    ONLINE       0     0     0
        sdo                    ONLINE       0     0     0
        sdp                    ONLINE       0     0     0
        sdq                    ONLINE       0     0     0

The time was 5h16m. But notice the error - during resilvering drive sdd had 208 read errors and data was lost. This is the classic RAID situation where resilvering stresses the drives, another goes bad and you can’t restore.

It’s somewhat questionable if this is a valid test as the affect of the error on resilvering duration is unknown. But on with the test.

Let’s wipe that away and create a raidz3

sudo zpool destroy srv


zpool create \
-f -m /srv srv \
raidz3 \
 sdb sdc sdd sde sdf sdg sdh sdi \
 sdj sdk sdl sdm sdn sdo sdp sdq \
 sdr sds sdt sdu sdv sdw sdx sdy



zdb
zpool offline srv 15700807100581040709
sudo zpool replace srv 15700807100581040709 sdc




allen@server:~$ sudo zpool status
  pool: srv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Mar 24 10:07:18 2019
    27.9G scanned out of 9.14T at 362M/s, 7h19m to go
    1.21G resilvered, 0.30% done
config:

    NAME             STATE     READ WRITE CKSUM
    srv              DEGRADED     0     0     0
      raidz3-0       DEGRADED     0     0     0
        sdb          ONLINE       0     0     0
        replacing-1  OFFLINE      0     0     0
          sdd        OFFLINE      0     0     0
          sdc        ONLINE       0     0     0  (resilvering)
        sde          ONLINE       0     0     0
        sdf          ONLINE       0     0     0
        ...


allen@server:~$ sudo zpool status

  pool: srv
 state: ONLINE
  scan: resilvered 405G in 6h58m with 0 errors on Sun Mar 24 17:05:50 2019
config:

    NAME        STATE     READ WRITE CKSUM
    srv         ONLINE       0     0     0
      raidz3-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
        sde     ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0
        ...

The time? 6h58m. Longer, but safer.

3.1.3.7.7 - ZFS Cache

Metadata Cache

There is a lot out there about ZFS cache config. I’ve found the most significant feature to be putting your metadata on a dedicated NVMe devices. This is noted as a ‘Special’ VDev. Here’s an example of a draid with such a device at the end.

Note: A 2x18 is bad practice - just more fun than a 3x12 with no spares.

zpool create -f pool02 \
    draid3:14d:18c:1s \
        ata-ST4000NM000A-2HZ100_WJG04M27 \
        ata-ST4000NM000A-2HZ100_WJG09BH7 \
        ata-ST4000NM000A-2HZ100_WJG0QJ7X \
        ata-ST4000NM000A-2HZ100_WS20ECCD \
        ata-ST4000NM000A-2HZ100_WS20ECFH \
        ata-ST4000NM000A-2HZ100_WS20JXTA \
        ata-ST4000NM0024-1HT178_Z4F14K76 \
        ata-ST4000NM0024-1HT178_Z4F17SJK \
        ata-ST4000NM0024-1HT178_Z4F17YBP \
        ata-ST4000NM0024-1HT178_Z4F1BJR1 \
        ata-ST4000NM002A-2HZ101_WJG0GBXB \
        ata-ST4000NM002A-2HZ101_WJG11NGC \
        ata-ST4000NM0035-1V4107_ZC1168N3 \
        ata-ST4000NM0035-1V4107_ZC116F11 \
        ata-ST4000NM0035-1V4107_ZC116MSW \
        ata-ST4000NM0035-1V4107_ZC116NZM \
        ata-ST4000NM0035-1V4107_ZC118WV5 \
        ata-ST4000NM0035-1V4107_ZC118WW0 \
    draid3:14d:18c:1s \
        ata-ST4000NM0035-1V4107_ZC118X74 \
        ata-ST4000NM0035-1V4107_ZC118X90 \
        ata-ST4000NM0035-1V4107_ZC118XBS \
        ata-ST4000NM0035-1V4107_ZC118Z23 \
        ata-ST4000NM0035-1V4107_ZC11907W \
        ata-ST4000NM0035-1V4107_ZC1192GG \
        ata-ST4000NM0035-1V4107_ZC1195PR \
        ata-ST4000NM0035-1V4107_ZC1195V5 \
        ata-ST4000NM0035-1V4107_ZC1195ZJ \
        ata-ST4000NM0035-1V4107_ZC11AHH9 \
        ata-ST4000NM0035-1V4107_ZC11CDD0 \
        ata-ST4000NM0035-1V4107_ZC11CE77 \
        ata-ST4000NM0035-1V4107_ZC11CV5E \
        ata-ST4000NM0035-1V4107_ZC11D2AQ \
        ata-ST4000NM0035-1V4107_ZC11HRGR \
        ata-ST4000NM0035-1V4107_ZC1B200R \
        ata-ST4000NM0035-1V4107_ZC1CBXEH \
        ata-ST4000NM0035-1V4107_ZC1DC98B \
    special mirror \
        ata-MICRON_M510DC_MTFDDAK960MBP_164614A1DBC4 \
        ata-MICRON_M510DC_MTFDDAK960MBP_170615BD4A74

zfs set special_small_blocks=64K pool02

Metadata is stores automatically on the special device but there’s a benefit in also directing the pool to use the special vdev for small files as well.

Sources

https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954

3.1.3.7.8 - ZFS Encryption

You might want to store data such that it’s encrypted at rest. Or replicate data to such as system. ZFS offers this on a per-dataset option.

Create an Encrypted Fileset

Let’s assume that you’re at a remote site and want to create an encrypted fileset to receive your replications.

zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase pool02/encrypted

Replicating to an Encrypted Fileset

This example uses mbuffer and assumes a secure VPN. Replace with SSH as needed.

# On the receiving side
sudo zfs load-key -r pool02/encrypted
mbuffer -4 -s 128k -m 1G -I 8990 | sudo zfs receive -s -F pool02/encrypted

# On the sending side
zfs send -i pool01/archive@snap1 pool01/archived@snap2 | mbuffer -s 128k -m 1G -O some.server:8990

3.1.3.7.9 - VDev Sizing

Best practice from Oracle says a VDev should be less than 9 disks¹. So given 24 disks you should have 3 VDevs. However, when using RAIDZ, the math shows they should be as large as possible with multiple parity disks². I.e. with 24 disks you should have a single, 24 disk VDev.

The reason for the best practice seems to be about the speed of writing and recovering from disk failures.

It is not recommended to create a zpool with a single large vdev, say 20 disks, because write IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).

With a single VDev, you break up the data to send a chunk to each drive, then wait for them all to finish writing before you send the next. With several VDevs, you can move on to the next while you wait for the others to finish.

Build a 3-Wide RAIDZ1

Create the pool across 24 disks

#
# -O is the pool's root dataset. Lowercase letter -o is for pool properties
# sudo zfs get compression to check. lz4 is now prefered
#
zpool create \
-m /srv srv \
-O compression=lz4 \
raidz sdb sdc sdd sde sdf sdg sdh sdi \
raidz sdj sdk sdl sdm sdn sdo sdp sdq \
raidz sdr sds sdt sdu sdv sdw sdx sdy -f

Copy a lot of random data to it.

#!/bin/bash

no_of_files=1000
counter=0
while [[ $counter -le $no_of_files ]]
 do echo Creating file no $counter
   touch random-file.$counter
   shred -n 1 -s 1G random-file.$counter
   let "counter += 1"
 done

Yank out (literally) one of the physical disks and replace it.

sudo zpool status                                                               [433/433]

  pool: srv                                                                                              
 state: DEGRADED                                                                                         
status: One or more devices could not be used because the label is missing or                            
        invalid.  Sufficient replicas exist for the pool to continue                                     
        functioning in a degraded state.                                                                 
action: Replace the device using 'zpool replace'.                                                        
   see: http://zfsonlinux.org/msg/ZFS-8000-4J                                                            
  scan: none requested                                                                                   
config:                                                                                                  
                                                                                                         
        NAME                     STATE     READ WRITE CKSUM                                              
        srv                      DEGRADED     0     0     0                                              
          raidz1-0               DEGRADED     0     0     0                                              
            sdb                  ONLINE       0     0     0                                              
            6847353731192779603  UNAVAIL      0     0     0  was /dev/sdc1                               
            sdd                  ONLINE       0     0     0                                              
            sde                  ONLINE       0     0     0                                              
            sdf                  ONLINE       0     0     0                                              
            sdg                  ONLINE       0     0     0                                              
            sdh                  ONLINE       0     0     0                                              
            sdi                  ONLINE       0     0     0                                              
          raidz1-1               ONLINE       0     0     0                                              
            sdr                  ONLINE       0     0     0                                              
            sds                  ONLINE       0     0     0                                              
            sdt                  ONLINE       0     0     0                                              
            sdu                  ONLINE       0     0     0                                              
            sdj                  ONLINE       0     0     0                                              
            sdk                  ONLINE       0     0     0                                              
            sdl                  ONLINE       0     0     0                                              
            sdm                  ONLINE       0     0     0  
          raidz1-2               ONLINE       0     0     0  
            sdv                  ONLINE       0     0     0  
            sdw                  ONLINE       0     0     0  
            sdx                  ONLINE       0     0     0  
            sdy                  ONLINE       0     0     0  
            sdn                  ONLINE       0     0     0  
            sdo                  ONLINE       0     0     0  
            sdp                  ONLINE       0     0     0  
            sdq                  ONLINE       0     0     0             
                                                                           
errors: No known data errors

Insert a new disk and replace the missing one.

allen@server:~$ lsblk                                                                                    
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT                                                              
sda      8:0    0 465.8G  0 disk                                                                         
├─sda1   8:1    0 449.9G  0 part /                                                                       
├─sda2   8:2    0     1K  0 part                                                                         
└─sda5   8:5    0  15.9G  0 part [SWAP]                                                                  
sdb      8:16   1 931.5G  0 disk                                                                         
├─sdb1   8:17   1 931.5G  0 part                                                                         
└─sdb9   8:25   1     8M  0 part                                                                         
sdc      8:32   1 931.5G  0 disk                       # <-- new disk showed up here
sdd      8:48   1 931.5G  0 disk                                                                         
├─sdd1   8:49   1 931.5G  0 part                                                                         
└─sdd9   8:57   1     8M  0 part    
...


sudo zpool replace srv 6847353731192779603 /dev/sdc -f

sudo zpool status

  pool: srv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Mar 22 15:50:21 2019
    131G scanned out of 13.5T at 941M/s, 4h7m to go
    5.40G resilvered, 0.95% done
config:

        NAME                       STATE     READ WRITE CKSUM
        srv                        DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            sdb                    ONLINE       0     0     0
            replacing-1            OFFLINE      0     0     0
              6847353731192779603  OFFLINE      0     0     0  was /dev/sdc1/old
              sdc                  ONLINE       0     0     0  (resilvering)
            sdd                    ONLINE       0     0     0
            sde                    ONLINE       0     0     0
            sdf                    ONLINE       0     0     0
...

We can see it’s running at 941M/s. Not too bad.

Build a 1-Wide RAIDZ3

sudo zpool destroy srv

zpool create \
-m /srv srv \
-O compression=lz4 \
raidz3 \
 sdb sdc sdd sde sdf sdg sdh sdi \
 sdj sdk sdl sdm sdn sdo sdp sdq \
 sdr sds sdt sdu sdv sdw sdx sdy -f

Copy a lot of random data to it again (as above)

Replace a disk (as above)

allen@server:~$ sudo zpool status
  pool: srv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Mar 24 10:07:18 2019
    27.9G scanned out of 9.14T at 362M/s, 7h19m to go
    1.21G resilvered, 0.30% done
config:

 NAME             STATE     READ WRITE CKSUM
 srv              DEGRADED     0     0     0
   raidz3-0       DEGRADED     0     0     0
     sdb          ONLINE       0     0     0
     replacing-1  OFFLINE      0     0     0
       sdd        OFFLINE      0     0     0
       sdc        ONLINE       0     0     0  (resilvering)
     sde          ONLINE       0     0     0
     sdf          ONLINE       0     0     0

So that’s running quite a bit slower. Not exactly 1/3, but closer to it than not.

Surprise Ending

That was all about speed. What about reliability?

Our first resilver was going a lot faster, but it ended badly. Other errors popped up, some on the same VDev as was being resilvered, and so it failed.

sudo zpool status

  pool: srv
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 571G in 5h16m with 2946 errors on Fri Mar 22 21:06:48 2019
config:

 NAME                       STATE     READ WRITE CKSUM
 srv                        DEGRADED   208     0 2.67K
   raidz1-0                 DEGRADED   208     0 5.16K
     sdb                    ONLINE       0     0     0
     replacing-1            OFFLINE      0     0     0
       6847353731192779603  OFFLINE      0     0     0  was /dev/sdc1/old
       sdc                  ONLINE       0     0     0
     sdd                    ONLINE     208     0     1
     sde                    ONLINE       0     0     0
     sdf                    ONLINE       0     0     0
     sdg                    ONLINE       0     0     0
     sdh                    ONLINE       0     0     0
     sdi                    ONLINE       0     0     0
   raidz1-1                 ONLINE       0     0     0
     sdr                    ONLINE       0     0     0
     sds                    ONLINE       0     0     0
     sdt                    ONLINE       0     0     0
     sdu                    ONLINE       0     0     0
     sdj                    ONLINE       0     0     1
     sdk                    ONLINE       0     0     1
     sdl                    ONLINE       0     0     0
     sdm                    ONLINE       0     0     0
   raidz1-2                 ONLINE       0     0     0
     sdv                    ONLINE       0     0     0
     sdw                    ONLINE       0     0     0
     sdx                    ONLINE       0     0     0
     sdy                    ONLINE       0     0     0
     sdn                    ONLINE       0     0     0
     sdo                    ONLINE       0     0     0
     sdp                    ONLINE       0     0     0
     sdq                    ONLINE       0     0     0

errors: 2946 data errors, use '-v' for a list

Our second resilver was going very slowly, but did slow and sure when the race? It did, but very very slowly.

allen@server:~$ sudo zpool status
[sudo] password for allen: 
  pool: srv
 state: ONLINE
  scan: resilvered 405G in 6h58m with 0 errors on Sun Mar 24 17:05:50 2019
config:

 NAME        STATE     READ WRITE CKSUM
 srv         ONLINE       0     0     0
   raidz3-0  ONLINE       0     0     0
     sdb     ONLINE       0     0     0
     sdc     ONLINE       0     0     0
     sde     ONLINE       0     0     0
     sdf     ONLINE       0     0     0
     sdg     ONLINE       0     0     0
      ...
      ...

It slowed even further down, as 400G in 7 hours is something like a 16M/s. I didn’t see any checksum errors this time, but that time is abysmal.

Though, to paraphrase Livy, better late than never.

3.1.3.7.10 - ZFS Replication Script

#!/usr/bin/env bash
#
# zfs-pull.sh
#
# Pulls incremental ZFS snapshots from a remote (source) server to the local (destination) server.
# Uses snapshots made by zfs-auto-snapshot. Locates the latest snapshot common to both sides
# to perform an incremental replication; if none is found, it does a full send.
#
# Usage: replicate-zfs-pull.sh <SOURCE_HOST> <SOURCE_DATASET> <DEST_DATASET>
#
# Example:
#   ./replicate-zfs-pull.sh mysourcehost tank/mydata tank/backup/mydata
#
# Assumptions/Notes:
#   - The local server is the destination. The remote server is the source.
#   - We're using "zfs recv -F" locally, which can forcibly roll back the destination 
#     dataset if it has diverging snapshots. Remove or change -F as desired.
#   - This script is minimal and doesn't handle advanced errors or timeouts gracefully.
#   - Key-based SSH authentication should be set up so that `ssh <SOURCE_HOST>` doesn't require a password prompt.
#

set -euo pipefail

##############################################################################
# 1. Parse command-line arguments
##############################################################################
if [[ $# -ne 3 ]]; then
  echo "Usage: $0 <SOURCE_HOST> <SOURCE_DATASET> <DEST_DATASET>"
  exit 1
fi

SOURCE_HOST="$1"
SOURCE_DATASET="$2"
DEST_DATASET="$3"

##############################################################################
# 2. Gather snapshot lists
#
#    The command zfs list -H -t snapshot -o name -S creation -d 1
#          -H           : Output without headers for script-friendliness
#          -t snapshot  : Only list snapshots
#          -o name      : Only list the name
#          -d 1         : Only descend one level - i.e. don't tree out child datasets
##############################################################################
# - Remote (source) snapshots: via SSH to the remote host
# - Local (destination) snapshots: from the local ZFS

echo "Collecting snapshots from remote source: ${SOURCE_HOST}:${SOURCE_DATASET}..."
REMOTE_SNAPSHOTS=$(ssh "${SOURCE_HOST}" zfs list -H -t snapshot -o name -d 1 "${SOURCE_DATASET}" 2>/dev/null \
  | grep "${SOURCE_DATASET}@" \
  | awk -F'@' '{print $2}' || true)

echo "Collecting snapshots from local destination: ${DEST_DATASET}..."
LOCAL_SNAPSHOTS=$(zfs list -H -t snapshot -o name -d 1 "${DEST_DATASET}" 2>/dev/null \
  | grep "${DEST_DATASET}@" \
  | awk -F'@' '{print $2}' || true)

##############################################################################
# 3. Find the latest common snapshot
#
#   The snapshots names have prefixes like "zfs-auto-snap_daily" and "zfs-auto-snap_hourly"
#   that confuse sorting for the linux comm program, so we strip the prefix with sed before 
#   using 'comm -12' to find common elements of input 1 and 2, and tail to get the last one.
#  
COMMON_SNAPSHOT=$(comm -12 <(echo "$REMOTE_SNAPSHOTS" | sed 's/zfs-auto-snap_\w*-//' | sort) <(echo "$LOCAL_SNAPSHOTS" | sed 's/zfs-auto-snap_\w*-//' | sort) | tail -n 1)

# We need the full name back for the transfer, so grep it out of the local list. Make sure to quote the variable sent to grep or you'll loose the newlines.
COMMON_SNAPSHOT=$(echo "$LOCAL_SNAPSHOTS" | grep $COMMON_SNAPSHOT)

if [[ -n "$COMMON_SNAPSHOT" ]]; then
  echo "Found common snapshot: $COMMON_SNAPSHOT"
else
  echo "No common snapshot found—will perform a full send."
fi


##############################################################################
# 4. Identify the most recent snapshot on the remote source
#
#   This works because we zfs list'ed the snapshots originally in order
#   so we can just take the first line with 'head -n 1'
##############################################################################
LATEST_REMOTE_SNAPSHOT=$(echo "$REMOTE_SNAPSHOTS" | head -n 1)

if [[ -z "$LATEST_REMOTE_SNAPSHOT" ]]; then
  echo "No snapshots found on the remote source. Check if zfs-auto-snapshot is enabled there."
  exit 1
fi

##############################################################################
# 5. Perform replication
##############################################################################
echo "Starting pull-based replication from ${SOURCE_HOST}:${SOURCE_DATASET} to local ${DEST_DATASET}..."

if [[ -n "$COMMON_SNAPSHOT" ]]; then
  echo "Performing incremental replication from @$COMMON_SNAPSHOT up to @$LATEST_REMOTE_SNAPSHOT."
  ssh "${SOURCE_HOST}" zfs send -I "${SOURCE_DATASET}@${COMMON_SNAPSHOT}" "${SOURCE_DATASET}@${LATEST_REMOTE_SNAPSHOT}" \
    | zfs recv -F "${DEST_DATASET}"
else
  echo "Performing full replication of @$LATEST_REMOTE_SNAPSHOT."
  ssh "${SOURCE_HOST}" zfs send "${SOURCE_DATASET}@${LATEST_REMOTE_SNAPSHOT}" \
    | zfs recv -F "${DEST_DATASET}"
fi

echo "Replication completed successfully!"

3.1.4 - Search

3.1.4.1 - Command Line Full Text

For occasional searches you’d use grep. Something like:

grep --ignore-case --files-with-matches --recursive foo /some/file/path

This can take quite a while. There are some tweaks to grep you can add, but for source code, ack is traditional. Even faster is ag a.k.a. The Silver Searcher (get it, ag - is silver in the periodic table… and possibly a play on words with The Silver Surfer)

apt install  silversearcher-ag

# Almost a drop in for grep
ag --ignore-case --files-with-matches --recurse foo /some/file/path

You’d think an index would be great - but then you realize that for unstructured text (i.e. full test searching) you have to build an index of every word in every file that the index is larger than the contents.

Though lucene/elasticseach and spinx come up in conversation.

Links

https://github.com/ggreer/the_silver_searcher

3.1.5 - Sync

Not backup, it’s simply coping data between multiple locations. More like mirroring.

3.1.5.1 - rsync

This is used enough that it deserves several pages.

3.1.5.1.1 - Basic Rsync

If you regularly copy lots of files it’s best to use rsync. It’s efficient, as it only copies what you need, and secure, being able to use SSH. Many other tools such as BackupPC, Duplicity etc. use rsync under the hood, and when you are doing cross-platform data replication it may be the only tool that works, so you’re best to learn it.

Local Copies

Generally, it’s 10% slower than just using cp -a. Sometimes start with that and finish up with this.

rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/some/source/Directory \
/some/destination/

The explanations of the more interesting options are:

--archive: Preserves all the metadata, as you'd expect
--delete : Removes extraneous files at the destination that no longer exist at the source (i.e. _not_ a merge)
--dry-run: Makes no changes. This is important for testing. Remove for the actual run
--inplace: This overwrites the file directly, rather than the default behavior that is to build a copy on the other end before moving it into place. This is slightly faster and better when space is limited (I've read)

If you don’t trust the timestamps at your destination, you can add the --checksum option, though when you’re local this may be slower than just recopying the whole thing.

A note about trailing slashes: In the source above, there is no trailing slash. But we could have added one, or even a /*. Here’s what happens when you do that.

No trailing slash - This will sync the directory as you’d expect.
Trailing slash - It will sync the contents of the directory to the location, rather than the directory itself.
Trailing /* - Try not to do this. It will sync each of the items in the source directory as if you had typed them individually. but not delete destination files that no longer exist on source, and so everything will be a merge regardless of if you issued the –delete parameter.

Across the Network

This uses SSH for encryption and authentication.

rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/srv/Source_Directory/* \
[email protected]:/srv/Destination_Directory

Windows to Linux

One easy way to do this is to grab a bundled version of rsync and ssh for windows from the cwRsync folks

<https://www.itefix.net/content/cwrsync-free-edition>

Extract the standalone client to a folder and edit the .cmd file to add this at the end (the ^ is the windows CRNL escape)

rsync ^
--archive ^
--delete ^
--dry-run ^
--human-readable ^
--inplace ^
--itemize-changes ^
--no-group ^
--no-owner ^
--progress ^
--verbose ^
--stats ^
[email protected]:/srv/media/video/movies/* /cygdrive/D/Media/Video/Movies/

pause

Mac OS X to Linux

The version that comes with recent versions of OS X is a 2.6.9 (or so) variant. You can use that, or obtain the more recent 3.0.9 that has some slight speed improvements and features. To get the newest (you have to build it yourself) install brew, then issue the commands:

brew install https://raw.github.com/Homebrew/homebrew-dupes/master/rsync.rb
brew install rsync

One of the issues with syncing between OS X and Linux is the handling of Mac resource forks (file metadata). Lets assume that you are only interested in data files (such as mp4) and are leaving out the extended attributes that apple uses to store icons and other assorted data (replacing the old resource fork).

Since we are going between file systems, rather than use the ‘a’ option that preserves file attributes, we specify only ‘recursive’ and ’times’. We also use some excludes keep mac specific files from tagging along.

/usr/local/bin/rsync 
    --exclude .DS*
    --exclude ._*        
    --human-readable 
    --inplace 
    --progress 
    --recursive  
    --times 
    --verbose 
    --itemize-changes 
    --dry-run       
    "/Volumes/3TB/source/" 
    [email protected]:"/Volumes/3TB/"

Importantly, we are ‘itemizing’ and doing a ‘dry-run’. When you do, you will see a report like:

skipping non-regular file "Photos/Summer.2004"
skipping non-regular file "Photos/Summer.2005"
.d..t....... Documents/
.d..t....... Documents/Work/
cd++++++++++ ISOs/
<f++++++++++ ISOs/Office.ISO

The line with cd+++ indicate a directory will be created and <f+++ indicate a file is going to be copied. When it says ‘skipping’ a non regular file, that’s (in this case, at least) a symlink. You can include them, but if your paths don’t match up on both systems, these links will fail.

Spaces in File Names

Generally you quote and escape.

rsync 
  --archive ^
  --itemize-changes ^
  --progress ^
  [email protected]:"/srv/media/audio/Music/Basil\ Poledouris" ^
  /cygdrive/c/Users/Allen/Music

Though it’s rumored that you can single quote and escape with the –protect-args option

--protect-args ^
[email protected]:'/srv/media/audio/Music/Basil Poledouris' ^

List of Files

You may want to combine find and rsync to get files of a specific criteria. Use the --from-file parameter

ssh server.gattis.org find /srv/media/video -type f -mtime -360 > list

rsync --progress --files-from=list server.gattis.org:/ /mnt/media/video/

Seeding an Initial Copy

If you have no data on the destination to begin with, rsync will be somewhat slower than a straight copy. On a local system simply use ‘cp -a’ (to preserve file times). On a remote system, you can use tar to minimize the file overhead.

tar -c /path/to/dir | ssh remote_server 'tar -xvf - -C /absolute/path/to/remotedir'

It is also possible to use rsync with the option --whole-file and this will skip the things that slow rsync down though I have not tested it’s speed

Time versus size

Rsync uses time and size to determine if a file should be updated. If you have already copied files and you are trying to do a sync, you may find your modification times are off. Add the –size-only or the –modify-window=NUM. Even better, correct your times. (this requires on OS X the coreutils to get the GNU ls command and working with the idea here)

http://notemagnet.blogspot.com/2009/10/getting-started-with-rsync-for-paranoid.html http://www.chrissearle.org/blog/technical/mac_homebrew_and_homebrew_alt http://ubuntuforums.org/showthread.php?t=1806213

3.1.5.1.2 - Rsync Daemon

Some low-power devices, such as the Raspbery Pi, struggle with the encryption overheard of rsync default network transport, ssh.

If you don’t need encryption or authentication, you can significantly speed things up by using rsync in daemon mode.

Push Config

In this example, we’ll push data from our server to the low-power client.

Create a Config File

Create a config file on the sever that we’ll send over to the client later.

nano client-rsyncd.conf

log file = /var/log/rsync.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

# This is the name you refer to in rsync. The path is where that maps to.
[media]
        path = /var/media
        comment = Media
        read only = false
        timeout = 300
        uid = you
        gid = you

Start and Push On-Demand

The default port is hi-level and doesn’t require root privileges.

# Send the daemon config over to the home dir
scp client-rsyncd.conf [email protected]:

# Launch rsync in daemon mode
ssh [email protected]: rsync --daemon --config ./client-rsyncd.conf

# Send the data over
rsync \
--archive \
--delete \
--human-readable \
--inplace \
--itemize-changes \
--no-group \
--no-owner \
--no-perms \
--omit-dir-times \
--progress \
--recursive \
--verbose \
--stats \
/mnt/pool01/media/movies rsync://client.some.lan:8730/media

# Terminate the remote instance
ssh [email protected] killall rsync

3.1.5.1.3 - Tunneled Rsync

One common task is to rsync through a bastion host to an internal system. Do it with the rsync shell options

rsync \
--archive \
--delete \
--delete-excluded \
--exclude "lost+found" \
--human-readable \
--inplace \
--progress \
--rsh='ssh -o "ProxyCommand ssh [email protected] -W %h:%p"' \
--verbose \
[email protected]:/srv/plex/* \
/data/

There is a -J or ProxyJUmp option on new versions of SSH as well.

https://superuser.com/questions/964244/rsyncing-directories-through-ssh-tunnel https://unix.stackexchange.com/questions/183951/what-do-the-h-and-p-do-in-this-command https://superuser.com/questions/1115715/rsync-files-via-intermediate-host

3.1.5.1.4 - Rsync Schedule

It’s usually best to wrap rsync and call it from cron. Preferably with something that doesn’t step on itself for long running syncs, like this:

vi ~/bin/schedule-rsync

#!/bin/bash

THE_USER="remote-account-1"
THE_KEY="remote-account-1" 

SCRIPT_NAME=$(basename "$0")
PIDOF=$(pidof -x $SCRIPT_NAME)

for PID in $PIDOF; do
    if [ $PID != $$ ]; then
        echo "[$(date)] : $SCRIPT_NAME : Process is already running with PID $PID"
        exit 1
    fi
done

# Change to working directory. Assume running as non-root user per cronfile config below
cd ~/bin


rsync \
--archive \
--bwlimit=5m \
--delete \
--delete-excluded \
--exclude .DS* \
--exclude ._* \
--human-readable \
--inplace \
--itemize-changes \
--no-group \
--no-owner \
--no-perms \
--progress \
--recursive \
--rsh "ssh -i ${THE_KEY}" \
--verbose \
--stats \
${THE_USER}@some.server.org\
:/mnt/pool01/folder.1 \
:/mnt/pool01/folder.2 \
:/mnt/pool01/folder.2 \
/mnt/pool02/

Then, call it from a file in the cron drop folder.

echo "0 1 * * * $USER /home/$USER/schedule-rsync  >> /home/$USER/bin/rsync-video.log 2>&1" > /etc/cron.d/schedule-rsync

3.1.5.1.5 - Rsync Without Login

You’d like to use rsync, but ensure users can only use rsync and can’t login with a shell, forward sessions, or other shenanigans. Do this with ssh keys and ForceCommand.

Limit Use With Keys and a Custom Script

# On your server, add a central location for keys
sudo mkdir /etc/ssh/authorized_keys

# Configure SSH to look for user public keys in that spot - the %u is the variable for user ID
echo "AuthorizedKeysFile /etc/ssh/authorized_keys/%u.pub" > /etc/ssh/sshd_config.d/authorized_users.conf

# Create a script that checks incoming ssh commands to make sure they are for rsync
sudo tee /etc/ssh/authorized_keys/checkssh.sh << "EOF"
#!/bin/bash
if [ -n "$SSH_ORIGINAL_COMMAND" ]; then
    if [[ "$SSH_ORIGINAL_COMMAND" =~ ^rsync\  ]]; then
        echo $SSH_ORIGINAL_COMMAND | systemd-cat -t rsync
        exec $SSH_ORIGINAL_COMMAND
    else
        echo DENIED $SSH_ORIGINAL_COMMAND | systemd-cat -t rsync
    fi
fi
EOF

chmod +x /etc/ssh/authorized_keys/checkssh.sh

systemctl restart ssh.service

Now that we have the SSH server configured, let’s create a system user (required, unfortunately) and a key. We’ll limit the account as much as possible, though you can’t use /usr/sbin/nologin shell, as rsync requires something to run in.

THE_USER="remote-account-1"
sudo adduser --no-create-home --home /nonexistent --disabled-password --gecos "" ${THE_USER}

# Its easiest to create the key yourself, but a .pub from them is also fine.
# Send out the private key from the folder (it's the one without the .pub on the end) to the remote system.
ssh-keygen -f /etc/ssh/authorized_keys/${THE_USER} -q -N "" -C "${THE_USER}"

Let’s add a ForcedCommand to the key so that it can only be used with the features we allow.

vi /etc/ssh/authorized_keys/${THE_USER}

# Paste this command="... string in front of the existing key
command="/etc/ssh/authorized_keys/checkssh.sh",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAAB3NzaC1...

This remote user can now use rsync, but can’t login or do other activities. Their command would look something like this (using the private key you created above)

rsync \
--rsh "ssh -i /where/your/private/key/is/remote-account-1" \
[email protected]:/some/folder /some/local/place/

Notes

Why not use rrsync?

The rrsync script is similar to the script we use, but is distributed and maintained as part of the rsync package. It’s arguably a better choice. I like the checkssh.sh approach as it’s more flexible, allows for things other than rsync, and doesn’t force relative paths. But if you’re only doing rsync, consider using rrsync like this;

# Paste this command="... string in front of the existing key
command="rrsync -ro /some/folder/share",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAAB3NzaC1...

In your client’s rsync command, make the paths relative to path rrsync expects above.

rsync [email protected]:folder.1 /destination/folder/

If you see the client-side error message:

rrsync error: option -L has been disabled on this server

You discovered that following symlinks has been disabled by default in rrsync. You can enable with an edit to the script.

sudo sed -i 's/KLk//' /usr/bin/rrsync

# This changes
#    short_disabled_subdir = 'KLk'
        to
#    short_disabled_subdir = ''

Script It

If you need do it more than once, it might look something like this.

#!/bin/bash

HELP_MESSAGE="Usage: $0 <user> \n\nThis script requires a username to be specified.\n"

if [ "$#" -eq 0 ]; then
    echo -e "$HELP_MESSAGE"
    exit 1
fi

if [ id username &>/dev/null ]; then
   echo "User already exists."
   exit 1
fi

if [ "$EUID" -ne 0 ]; then
    echo "This script must be run with sudo."
    exit 1
fi

THE_USER=$1

THE_COMMAND="\
command=\
\"/etc/ssh/authorized_keys/checkssh.sh\",\
no-port-forwarding,\
no-X11-forwarding,\
no-agent-forwarding,\
no-pty "

useradd --home-dir /nonexistent ${THE_USER}

mkdir -p /etc/ssh/authorized_keys
ssh-keygen -f /etc/ssh/authorized_keys/${THE_USER} -q -N "" -C "${THE_USER}"

sed -i "1s|^|$THE_COMMAND|" /etc/ssh/authorized_keys/${THE_USER}.pub

Sources

https://peterbabic.dev/blog/transfer-files-between-servers-using-rrsync/ http://gergap.de/restrict-ssh-to-rsync.html https://superuser.com/questions/641275/make-linux-server-allow-rsync-scp-sftp-but-not-a-terminal-login

3.1.5.2 - Tar Pipe

AKA - The Fastest Way to Copy Files.

When you don’t want to copy a whole file system, many admins suggest the fastest way is with a ’tar pipe'.

Locally

From one disk to another on the same system. This uses pv to buffer.

(cd /src; tar cpf - .) | pv -trab -B 500M | (cd /dst; tar xpf -)

Across the network

NetCat

You can add netcat to the mix (as long as you don’t need encryption) to get it across the network.

On the receiver:

(change to the directory you want to receive the files or directories in)

nc -l -p 8989 | tar -xpzf -

On the sender:

(change to the directory that has the file or directory - like ‘pics’ - in it)

tar -czf - pics | nc some.server 8989

mbuffer

This takes the place of pc and nc and is somewhat faster.

On the receiving side

    mbuffer -4 -I 9090 | tar -xf -

On the sending side

    sudo tar -c plexmediaserver | mbuffer -m 1G -O SOME.IP:9090

SSH

You can use ssh when netcat isn’t appropriate or you want to automate with a SSH key and limited interaction with the other side. This examples ‘pulls’ from a remote server.

 (ssh [email protected] tar -czf - /srv/http/someSite) | (tar -xzf -)

NFS

If you already have a NFS server on one of the systems though, it’s basically just as fast. At least in informal testing, it behaves more steadily as opposed to a tar pipe’s higher peaks and lower troughs. A simple cp -a will suffice though for lots of little files a tar pipe still may be faster.

rsync

rsync is generally best if you can or expect the transfer to be interrupted. In my testing, rsync achieved about 15% less throughput with about 10% more processor overhead.

http://serverfault.com/questions/43014/copying-a-large-directory-tree-locally-cp-or-rsync http://unix.stackexchange.com/questions/66647/faster-alternative-to-cp-a http://serverfault.com/questions/18125/how-to-copy-a-large-number-of-files-quickly-between-two-servers

3.1.5.3 - Unison

Unison offers several features that make it more useful than rsync;

Multi-Way File Sync
Detect Renames and Copies
Delta copies

Multi-Way File Sync

Rsync is good at one-way synchronization. i.e. one to many. But when you need to sync multiple authoritative systems, i.e. many to many, you want to use unison. It allows you to merge changes.

Detect Renames and Copies (xferbycopying)

Another problem with rsync is that when you rename a file, it re-sends it. This is because a re-named file appears ’new’ to the sync utility. Unison however, maintains a hash of every file you’ve synced and if there is already a local copy (i.e. the file before you renamed it), it will use that and do a ’local copy’ rather than sending it. So a rename effectively is a local copy and a delete. Not perfect, but better than sending it across the wire.

Delta Copies

Unison uses it’s own implementation of the rsync delta copy algorithm. However, for large files the authors recommend an option that wraps rsync itself as you can optimize it for large files. Use Unison can use config files in your ~/.unison folder. If you type ‘unison’ without any arguments, it will use the ‘default.prf’ file. Here is a sample

# Unison preferences file

# Here are the two server 'roots' i.e., the start of where we will pick out things to sync.
# The first root is local, and the other remote over ssh
root = /mnt/someFolder
root = ssh://[email protected]//mnt/someFolder

# The 'path' is simply the name of a folder or file you want to sync. Notice the spaces are preserved. No not excape them.
path = A Folder Inside someFolder

# We're 'forcing' the first root to win all conflicts. This sort of negates the multi-way
# sync feature but it's just an example 
force = /mnt/someFolder

# This instructs unison to copy the contents of sym links, rather than the link itself
follow = Regex .*


# You can also ignore files and paths explicitly or pattern. See the 'Path specification' 
ignore = Name .AppleDouble
ignore = Name .DS_Store
ignore = Name .Parent
ignore = Name ._*

# Here we are invoking an external engine (rsync) when a file is over 10M, and passing it some arguments 
copythreshold = 10000
copyprog      = rsync --inplace
copyprogrest  = rsync --partial --inplace

Hostname is important. Unison builds a hash of all the files to determine what’s changed (similar to md5sum with rsync, but faster). If you get repeated messages about ‘…first time being run…’ you may have an error in your path

http://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html

3.1.6 - TrueNAS

TrueNAS is a storage “appliance” in the sense that it’s a well put together linux system with a web administration layer. If you find yourself digging into the linux internals, you’re doing it wrong.

3.1.6.1 - Disk Replacement

You may get an alert in the GUI along the lines of

Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors.

This is one of the predictive failures that Backblaze mentions. You should replace the drive. You may also get an outright failure such as:

Pool pool01 state is DEGRADED: One or more devices has experienced an unrecoverable error.

That’s a drive that has already failed and likewise you must replace it.

Using The GUI

Get The Device Number

In the GUI, Observe the Device Number from the alert. It will be similar to daXX like this

Device: /dev/sda [SAT], 24 Currently unreadable (pending) sectors.

Off-line the disk

Navigate to the device and mark it off-line.

Storage -> (pool name) -> Manage Devices -> (Select the disk) -> ZFS Info -> Offline Button

Physically Replace the Disk

You probably know what bay it is, but a mistake can take the pool down if you only have Z1. If you have a larger server it may have bay lights you can use. Check the command line section below on how to light the indicator.

Logically Replace The Drive

Navigate to the pool, find the drive and in the Disk Info menu click “replace”. The now windows should allow you to pick the ’new’ drive. There should be only unused drives listed. If nothing is listed, you may need to wipefs -a /dev/sdX on it first.

Observe The Resilvering Process

The system should automatically start rebuilding the array.

At The Command Line

It’s ‘strongly advised against’ using the CLI to replace the disk. The GUI takes several steps to prepare the disk and adds a partition to the pool, not the whole disk.

Identify and Off-Line The Disk

Use the gptid from zpool to get the device number, then off-line the disk.

sudo zpool status

 raidz3-2                                 ONLINE       0     0     0
    976e8f03-931d-4c9f-873e-048eeef08680  ONLINE       0     0     0
    f9384b4f-d94a-43b6-99c4-b8af6702ca42  ONLINE       0     0     0
    c5e4f2e5-62f2-41cc-a8de-836ff9683332  ONLINE       0     0 35.4K

sudo zpool offline pool01 c5e4f2e5-62f2-41cc-a8de-836ff9683332

Get The Serial and Blink The Indicator

find /dev/disk -name c5e4f2e5-62f2-41cc-a8de-836ff9683332 -exec ls -lah {} ;

lrwxrwxrwx 1 root root 11 Jan 10 08:41 /dev/disk/by-partuuid/c5e4f2e5-62f2-41cc-a8de-836ff9683332 -> ../../sdae1

sudo smartctl -a /dev/sdae1 | grep Serial

Serial Number:    WJG1LNP7

sas3ircu list

sas3ircu 0 display

sas3ircu 0 display | grep -B 10 WJG1LNP7

(Inspect the output for Enclosure and Slot to use in that order below)

sudo sas3ircu 0 locate 3:5 ON

Physically Replace The Drive

This is a physical swap - the indicator will be blinking red. Turn it off when you’re done

sudo sas3ircu 0 locate 3:5 OFF

Logically Replace The Removed Drive

(It’s probably the same device identifier, but you can tail the message log to make sure)

sudo dmesg | tail

sudo zpool replace pool01 c5e4f2e5-62f2-41cc-a8de-836ff9683332 sdae -f

sudo zpool status

 raidz3-2                                  DEGRADED     0     0     0
   976e8f03-931d-4c9f-873e-048eeef08680    ONLINE       0     0     0
   f9384b4f-d94a-43b6-99c4-b8af6702ca42    ONLINE       0     0     0
   replacing-2                             DEGRADED     0     0     0
     c5e4f2e5-62f2-41cc-a8de-836ff9683332  REMOVED      0     0     0
     sdae                                  ONLINE       0     0     0

Hot Spare

If you’re using a hot spare, you may need to detach it after the resilver is finished so status as a spare is returned. Check the spare’s ID at the bottom and then detach it.

zpool status zpool detach pool01 9d794dfd-2ef6-432d-8252-0c93e79509dc

Troubleshooting

When working at the command line, you may need download the sas3ircu utility from Broadcom.

wget https://docs.broadcom.com/docs-and-downloads/host-bus-adapters/host-bus-adapters-common-files/sas_sata_12g_p16_point_release/SAS3IRCU_P16.zip

If you forgot what light you turned on, you can turn off all slot lights.

for X in {0..23};do echo sas3ircu 0 locate 2:$X OFF;done for X in {0..11};do sas3ircu 0 locate 3:$X OFF;done

To recreate the GUI process at the command line, as adapted from https://www.truenas.com/community/resources/creating-a-degraded-pool.100/ use these commands. Though gpart and glable are not present on TrueNAS Scale, so you would have to adapt this to another tool.

gpart create -s gpt /dev/da18 gpart add -i 1 -b 128 -t freebsd-swap -s 2g /dev/da18 gpart add -i 2 -t freebsd-zfs /dev/da18

zpool replace pool01 65f61699-e2fc-4a36-86dd-b0fa6a77479

3.2 - Monitoring

Infrastructure monitoring is usually about metrics and alerts. You’re concerned about status and performance - is it up and how’s it doing? and when do we need to buy more?

3.2.1 - Metrics

3.2.1.1 - Prometheus

Overview

Prometheus is a time series database, meaning it’s optimized to store and work with data organized in time order. It includes in it’s single binary:

Database engine
Collector
Simple web-based user interface

This allows you to collect and manage data with fewer tools and less complexity than other solutions.

Data Collection

End-points normally expose metrics to Prometheus by making a web page available that it can poll. This is done by including a instrumentation library (provided by Prometheus) or simply adding a listener on a high-level port that spits out some text when asked.

For systems that don’t support Prometheus natively, there are a few add-on services to translate. These are called ’exporters’ and translate things such as SNMP into a web format Prometheus can ingest.

Alerting

You can also alert on the data collected. This is through the Alert Manager, a second package that works closely with Prometheus.

Visualization

You still need a dashboard tool like Grafana to handle visualizations, but you can get started quite quickly with just Prometheus.

3.2.1.1.1 - Installation

Install from the Debian Testing repo, as stable can be up to a year behind.

# Testing
echo 'deb http://deb.debian.org/debian testing main' | sudo tee -a /etc/apt/sources.list.d/testing.list

# Pin testing down to a low level so the rest of your packages don't get upgraded
sudo tee -a /etc/apt/preferences.d/not-testing << EOF
Package: *
Pin: release a=testing
Pin-Priority: 50
EOF

# Living Dangerously with test
sudo apt update
sudo apt install -t testing prometheus

Configuration

Use this for your starting config.

cat /etc/prometheus/prometheus.yml

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

This says every 15 seconds, run down the job list. And there is one job - to check out the system at ’localhost:9090’ which happens to be itself.

For every target listed, the scraper makes a web request for /metrics/ and stores the results. It ingests all the data presented and stores them for 15 days. You can choose to ignore certain elements or retain differently by adding config, but by default it takes everything given.

You can see this yourself by just asking like Prometheus would. Hit it up directly in your browser. For example, Prometheus is making metrics available at /metrics

http://some.server:9090/metrics

Operation

User Interface

You can access the Web UI at:

http://some.server:9090

At the top, select Graph (you should be there already) and in the Console tab click the dropdown labeled “insert metric at cursor”. There you will see all the data being exposed. This is mostly about the GO language it’s written in, and not super interesting. A simple Graph tab is available as well.

Data Composition

Data can be simple, like:

go_gc_duration_seconds_sum 3

Or it can be dimensional which is accomplished by adding labels. In the example below, the value of go_gc_duration_seconds has 5 labeled sub-sets.

go_gc_duration_seconds{quantile="0"} 4.5697e-05
go_gc_duration_seconds{quantile="0.25"} 7.814e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103396
go_gc_duration_seconds{quantile="0.75"} 0.000143687
go_gc_duration_seconds{quantile="1"} 0.001030941

In this example, the value of net_conntrack_dialer_conn_failed_total has several.

net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="unknown"} 0

How is this useful? It allows you to do aggregations - such as looking at all the net_contrack failures, and also look at the failures that were specifically refused. All with the same data.

Removing Data

You may have a target you want to remove. Such as a typo hostname that is now causing a large red bar on a dashboard. You can remove that mistake by enabling the admin API and issuing a delete

sudo sed -i 's/^ARGS.*/ARGS="--web.enable-admin-api"/' /etc/default/prometheus

sudo systemctl reload prometheus

curl -s -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="badhost.some.org:9100"}'

The default retention is 15 days. You may want less than that and you can configure --storage.tsdb.retention.time=1d similar to above. ALL data has the same retention, however. If you want historical data you must have a separate instance or use VictoriaMetrics.

Next Steps

Let’s get something interesting to see by adding some OS metrics

Troubleshooting

If you can’t start the prometheus server, it may be an issue with the init file. Test and Prod repos use different defaults. Add some values explicitly to get new versions running

sudo vi /etc/default/prometheus

ARGS="--config.file="/etc/prometheus/prometheus.yml  --storage.tsdb.path="/var/lib/prometheus/metrics2/"

3.2.1.1.2 - Node Exporter

This is a service you install on your end-points that make CPU/Memory/Etc. metrics available to Prometheus.

Installation

On each device you want to monitor, install the node exporter with this command.

sudo apt install prometheus-node-exporter

Do a quick test to make sure it’s responding to scrapes.

curl localhost:9100/metrics

Configuration

Back on your Prometheus server, add these new nodes as a job in the prometheus.yaml file. Feel free to drop the initial job where Prometheus was scraping itself.

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'servers'
    static_configs:
    - targets:
      - some.server:9100
      - some.other.server:9100
      - and.so.on:9100

sudo systemctl reload prometheus.service

Operation

You can check the status of your new targets at:

http://some.server:9090/classic/targets

A lot of data is collected by default. On some low power systems you may want less. For just the basics, customize the the config to disable the defaults and only enable specific collectors.

In the case below we are reduce collection to just CPU, Memory, and Hardware metrics. When scraping a Pi 3B, this reduces the Scrape Duration from 500 to 50ms.

sudo sed -i 's/^ARGS.*/ARGS="--collector.disable-defaults --collector.hwmon --collector.cpu --collector.meminfo"/' /etc/default/prometheus-node-exporter
sudo systemctl restart prometheus-node-exporter

The available collectors are listed on the page:

https://github.com/prometheus/node_exporter

3.2.1.1.3 - SNMP Exporter

SNMP is one of the most prevalent (and clunky) protocols still widely used on network-attached devices. But it’s a good general-purpose way to get data from lots of different makes of products in a similar way.

But Prometheus doesn’t understand SNMP. The solution is a translation service that acts a a middle-man and ’exports’ data from those devices in a way Prometheus can.

Installation

Assuming you’ve already installed Prometheus, install some SNMP tools and the exporter. If you have an error installing the mibs-downloader, check troubleshooting at the bottom.

sudo apt install snmp snmp-mibs-downloader
sudo apt install -t testing prometheus-snmp-exporter

Change the SNMP tools config file to allow use of installed MIBs. It’s disabled by default.

# The entry 'mibs:' in the file overrides the default path. Comment it out so the defaults kick back in.
sudo sed -i 's/^mibs/# &/' /etc/snmp/snmp.conf

Preparation

We need a target, so assuming you have a switch somewhere and can enable SNMP on it, let’s query the switch for its name, AKA sysName. Here we’re using version “2c” of the protocol with the read-only password “public”. Pretty standard.

Industry Standard Query

There are some well-known SNMP objects you can query, like System Name.

# Get the first value (starting at 0) of the sysName object
snmpget -Oqv -v 2c -c public some.switch.address sysName.0

Some-Switch

# Sometimes you have to use 'getnext' if 0 isn't populated
snmpgetnext -v 2c -c public some.switch.address sysName

Vendor Specific Query

Some vendors will release their own custom MIBs. These provide additional data for things that don’t have well-known objects. Add the MIBs to the system and ‘walk’ the tree to see what’s interesting.

# Unifi, for example
sudo cp UBNT-MIB.txt UBNT-UniFi-MIB.txt  /usr/share/snmp/mibs

# snmpwalk doesn't look for enterprise sections by default, so you have to 
# look at the MIB and add the specific company's OID number.
grep enterprises UBNT-*
...
UBNT-MIB.txt:    ubnt OBJECT IDENTIFIER ::= { enterprises 41112 }
...

snmpwalk -v2c -c public 10.10.202.246 enterprises.41112

Note: If you get back an error or just the ‘iso’ prefixed value, double check the default MIB path.

Configuration

To add this switch to the Prometheus scraper, add a new job to the prometheus.yaml file. This job will include the targets as normal, but also the path (since it’s different than default) and an optional parameter called module that specific to the SNMP exporter. It also does something confusing - a relabel_config

This is because Prometheus isn’t actually taking to the switch, it’s talking to the local SNMP exporter service. So we put all the targets normally, and then at the bottom ‘oh, by the way, do a switcheroo’. This allows Prometheus to display all the data normally with no one the wiser.

...
...
scrape_configs:
  - job_name: 'snmp'
    static_configs:
      - targets:
        - some.switch.address    
    metrics_path: /snmp
    params:
      module: [if_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9116  # The SNMP exporter's real hostname:port.

Operation

No configuration on the exporter side is needed. Reload the config and check the target list. Then examine data in the graph section. Add additional targets as needed and the exporter will pull in the data.

http://some.server:9090/classic/targets

These metrics are considered well known and so will appear in the database named sysUpTime and upsBasicBatteryStatus and not be prefixed with snmp_ like you might expect.

Next Steps

If you have something non-standard, or you simply don’t want that huge amount of data in your system, look at the link below to customize the SNMP collection with the Generator.

SNMP Exporter Generator Customization

Troubleshooting

The snmp-mibs-downloader is just a handy way to download a bunch of default MIBs so when you use the tools, all the cryptic numbers, like “1.3.6.1.2.1.17.4.3.1” are translated into pleasant names.

If you can’t find the mibs-downloader its probably because it’s in the non-free repo and that’s not enabled by default. Change your apt sources file like so

sudo vi /etc/apt/sources.list

deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free

deb http://security.debian.org/debian-security bullseye-security main contrib non-free
deb-src http://security.debian.org/debian-security bullseye-security main contrib non-free

deb http://deb.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye-updates main contrib non-free

It may be that you only need to change one line.

3.2.1.1.4 - SNMP Generator

Installation

There is no need to install the Generator as it comes with the SNMP exporter. But if you have a device that supplies it’s own MIB (and many do), you should add that to the default location.

# Mibs are often named SOMETHING-MIB.txt
sudo cp -n *MIB.txt /usr/share/snmp/mibs/

Preparation

You must identify the values you want to capture. Using snmpwalk is a good way to see what’s available, but it helps to have a little context.

The data is arranged like a folder structure that you drill-down though. The folder names are all numeric, with ‘.’ instead of slashes. So if you wanted to get a device’s sysName you’d click down through 1.3.6.1.2.1.1.5 and look in the file 0.

When you use snmpwalk it starts wherever you tell it and then starts drilling-down, printing out everything it finds.

How do you know that’s where sysName is at? A bunch of folks got together (the ISO folks) and decided everything in advance. Then they made some handy files (MIBs) and passed them out so you didn’t have to remember all the numbers.

They allow vendors to create their own sections as well, for things that might not fit anywhere else.

A good place to start is looking at what the vendor made available. You see this by walking their section and including their MIB so you get descriptive names - only the ISO System MIB is included by default.

# The SysobjectID identifies the vendor section
# Note use of the MIB name without the .txt
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address SysobjectID

SNMPv2-MIB::sysObjectID.0 = OID: SOMEVENDOR-MIB::somevendoramerica

# Then walk the vendor section using the name from above
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c some.address somevendoramerica

SOMEVENDOR-MIB::model.0 = STRING: SOME-MODEL
SOMEVENDOR-MIB::power.0 = INTEGER: 0
...
...

# Also check out the general System section
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address system

# You can also walk the whole ISO tree. In some cases,
# there are thousands of entries and it's indecipherable
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.system iso

This can be a lot of information and you’ll need to do some homework to see what data you want to collect.

Configuration

The exporter’s default configuration file is snmp.yml and contains about 57 Thousand lines of config. It’s designed to pull data from whatever you point it at. Basically, it doesn’t know what device it’s talking to, so it tries to cover all the bases.

This isn’t a file you should edit by hand. Instead, you create instructions for the generator and it look though the MIBs and create one for you. Here’s an example for a Samlex Invertor.

vim ~/generator.yml

modules:
  samlex:
    walk:
      - sysLocation
      - inverterMode
      - power
      - vin
      - tempDD
      - tempDA

prometheus-snmp-generator generate
sudo cp /etc/prometheus/snmp.yml /etc/prometheus/snmp.yml.orig
sudo cp ~/snmp.yml /etc/prometheus
sudo systemctl reload prometheus-snmp-exporter.service

Configuration in Prometheus remains the same - but since we picked a new module name we need to adjust that.

    ...
    ...
    params:
      module: [samlex]
    ...
    ...

sudo systemctl reload prometheus.service

Adding Data Prefixes

by default, the names are all over the place. The SNMP Exporter Devs leave it this way because there are a lot of pre-built dashboards on downstream systems that expect the existing names.

If you are building your own downstream systems you can prefix (as is best-practice) as you like with a post generation step. This example cases them all to be prefixed with samlex_.

prometheus-snmp-generator generate
sed -i 's/name: /name: samlex_/' snmp.yml

Combining MIBs

You can combine multiple systems in the generator file to create one snmp.yml file, and refer to them by the module name in the Prometheus file.

modules:
  samlex:
    walk:
      - sysLocation
      - inverterMode
      - power
      - vin
      - tempDD
      - tempDA
  ubiquiti:
    walk:
      - something
      - somethingElse

Operation

As before, you can get a preview directly from the exporter (using a link like below). This data should show up in the Web UI too.

http://some.server:9116/snmp?module=samlex&target=some.device

Sources

https://github.com/prometheus/snmp_exporter/tree/main/generator

3.2.1.1.5 - Sensors DHT

DHT stands for Digital Humidity and Temperature. At less than $5 they are cheap and can be hooked to a Raspberry Pi easily. Add a Prometheus exporter if you want to do at scale.

Connect the Senor
Provision and Install the Python Libraries
Test the Libraries and the Sensor
Install the Prometheus Exporter as a Service
Create a Service Account
Add to Prometheus

Connect The Sensor

These usually come as a breakout-board with three leads you can connect to the Raspberry PI GPIO pins as follows;

Positive lead to pin 1 (power)
Negative lead to pin 6 (ground)
Middle or ‘out’ lead to pin 7 (that’s GPIO 4)

Image of DHT Connection

(From https://github.com/rnieva/Playing-with-Sensors---Raspberry-Pi)

Provision and Install

Use the Raspberry Pi Imager to Provision the Pi with Raspberry PI OS Lite 64 bit. Next, install the “adafruit_blinka” library as adapted from their instructions and test.

# General updates
sudo apt update
sudo apt -y upgrade
sudo apt -y autoremove
sudo reboot

# These python components may already be installed, but making sure
sudo apt -y install python3-pip
sudo apt -y install --upgrade python3-setuptools
sudo apt -y install python3-venv

# Make a virtual environment for the python process
sudo mkdir /usr/local/bin/sensor-dht
sudo python3 -m venv /usr/local/bin/sensor-dht --system-site-packages
cd /usr/local/bin/sensor-dht
sudo chown -R ${USER}:${USER} .
source bin/activate

# Build and install the library
pip3 install --upgrade adafruit-python-shell
wget https://raw.githubusercontent.com/adafruit/Raspberry-Pi-Installer-Scripts/master/raspi-blinka.py
sudo -E env PATH=$PATH python3 raspi-blinka.py

Test the Libraries and the Sensor

After logging back in, test the blinka lib.

cd /usr/local/bin/sensor-dht
source bin/activate
wget https://learn.adafruit.com/elements/2993427/download -O blinkatest.py
python3 blinkatest.py

Then install the DHT library from CircuitPython and create a script to test the sensor.

cd /usr/local/bin/sensor-dht
source bin/activate
pip3 install adafruit-circuitpython-dht

vi sensortest.py

import board
import adafruit_dht

dhtDevice = adafruit_dht.DHT11(board.D4)
temp = dhtDevice.temperature
humidity = dhtDevice.humidity

print(
  "Temp: {:.1f} C    Humidity: {}% ".format(temp, humidity)
)

dhtDevice.exit()

You can get occasional errors like RuntimeError: Checksum did not validate. Try again. that are safe to ignore. These DHTs are not 100% solid.

Install the Prometheus Exporter as a Service

Add the Prometheus pips.

cd /usr/local/bin/sensor-dht
source bin/activate
pip3 install prometheus_client

And create a script like this.

nano sensor.py

import board
import adafruit_dht
import time
from prometheus_client import start_http_server, Gauge

dhtDevice = adafruit_dht.DHT11(board.D4)

temperature_gauge= Gauge('dht_temperature', 'Local temperature')
humidity_gauge = Gauge('dht_humidity', 'Local humidity')

start_http_server(8000)
    
while True:

 try:
  temperature = dhtDevice.temperature
  temperature_gauge.set(temperature)

  humidity = dhtDevice.humidity
  humidity_gauge.set(humidity)
 
 except:
  # Errors happen fairly often as DHT's are hard to read. Just continue on.
  continue

 finally:
  time.sleep(60)

Create a service

sudo nano /lib/systemd/system/sensor.service

[Unit]
Description=Temperature and Humidity Sensing Service
After=network.target

[Service]
Type=idle
Restart=on-failure
User=root
ExecStart=/bin/bash -c 'cd /usr/local/bin/sensor-dht && source bin/activate && python sensor.py'

[Install]
WantedBy=multi-user.target

Enable and start it

sudo systemctl enable --now sensor.service

curl http://localhost:8000/metrics

Create a Service Account

This service is running as root. You should consider creating a sensor account.

sudo useradd --home-dir /usr/local/bin/sensor-dht --system --shell /usr/sbin/nologin --comment "Sensor Service" sensor
sudo usermod -aG gpio sensor

sudo systemctl stop  sensor.service
sudo chown -R sensor:sensor /usr/local/bin/sensor-dht
sudo sed -i 's/User=root/User=sensor/' /lib/systemd/system/sensor.service

sudo systemctl daemon-reload 
sudo systemctl start sensor.service

Add to Prometheus

Adding it requires logging into your Prometheus server and adding a job like below.

sudo vi /etc/prometheus/prometheus.yml

...
...
  - job_name: 'dht' 
    static_configs: 
      - targets: 
        - 192.168.1.45:8000

You will be able to find the node in your server at http://YOUR-SERVER:9090/targets?search=#pool-dht and data will show up with a leading dht_....

Sources

https://randomnerdtutorials.com/raspberry-pi-dht11-dht22-python/

You may want to raise errors to the log as in the above source.

3.2.1.2 - Smokeping

I’ve been using Smokeping for at least 20 years. Every so often I look at the competitors, but it’s still the best for a self-contained latency monitoring system.

Installation

On Debian Stretch:

# Install smokeping - apache gets installed automatically and the config enabled
sudo apt-get install smokeping

# Install and enable SpeedyCGI if you can find it, otherwise, fastCGI
sudo apt install libapache2-mod-fcgid

Configure

Edit the General Config file

sudo vim /etc/smokeping/config.d/General

owner    = Some Org
contact  = [email protected]
mailhost = localhost

cgiurl   = http://some.server.you.just.deployed.on/cgi-bin/smokeping.cgi

# specify this to get syslog logging
syslogfacility = local0

# each probe is now run in its own process
# disable this to revert to the old behaviour
# concurrentprobes = no

@include /etc/smokeping/config.d/pathnames

Edit the pathnames file. You must put in a value for sendmail (If you don’t have it) so that smoke ping will run.

sudo vim /etc/smokeping/config.d/pathnames


sendmail = /bin/false
imgcache = /var/cache/smokeping/images
imgurl   = ../smokeping/images
...
...

Edit the Alerts

sudo vim /etc/smokeping/config.d/Alerts


to = [email protected]
from = [email protected]

Edit the Targets

# Add your own targets that you will measure by appending them to the bottom of the targets file. 
# These will show up in a menu on the left of the generated web page. You add an entry starting with a + to create a top level entry, and subsequent lines with ++ that will show up as sub entries like so:

#        + My Company

#        ++ My Company's Web Server 1

#        ++ My Company's Web Server 2


#    Actual config requires a few extra lines, as below;


sudo vim /etc/smokeping/config.d/Targets


    + My_Company

    menu =  My Company
    title = My Company


    ++ Web_Server_1
    menu = Web Server 1
    title = Web Server 1
    host = web.server.org


# Restart smokeping and apache

sudo service smokeping restart
sudo service apache2 reload

Access smokeping at:

http://some.server.name/smokeping

Notes

The default resolution - i.e. polling frequency is 20 requests over 5 min - or 1 request every 15 seconds

http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/SA/v12/i07/a5.htm

ArchWiki suggests a workaround for sendmail

https://wiki.archlinux.org/index.php/smokeping

3.2.2 - Visualization

3.2.2.1 - Grafana

3.3 - Operating Systems

3.3.1 - NetBoot

Most computers come with ‘firmware’. This is a built-in mini OS, embedded in the chips, that’s just smart enough to start things up and hand-off to something more capable.

That more-capable thing is usually an Operating System on a disk, but it can also be something over the network. This lets you:

Run an OS installer, such as when you don’t have one installed yet.
Run an the whole OS remotely without having a local disk at all.

PXE

The original way was Intel’s PXE (Preboot eXecution Environment) Option ROM on their network cards. The IBM PC firmware (BIOS) would would turn over execution to it and PXE would use basic network drivers to get on the network.

HTTP Boot

Modern machines have newer firmware (UEFI) and it includes logic on how to use HTTP/S without the need for add-ons. This simplifies thigns and also solves potential man-in-the middle attacks. Both methods are still generally called PXE booting, though.

Building a NetBoot Environment

Start by setting up a HTTP Boot system, then add PXE Booting and netboot.xyz to it. This gets you an installation system. Then proceed to diskless stations.

3.3.1.1 - HTTP Boot

We’ll set up a PXE Proxy server that runs DHCP and HTTP. This server and can be used along side your existing DHCP/DNS servers. We use Debian in this example but anything that runs dnsmasq should work.

Installation

sudo apt install dnsmasq lighttpd

Configuration

Server

Static IPs are best practice, though we’ll use a hostname in this config, so the main thing is that the server name netboot resolves correctly.

HTTP

Lighttpd serves up from /var/www/http so just drop an ISO there. For example, take a look at the current debian ISO (the numbering changes) at https://www.debian.org/CD/netinst and copy the link in like so:

sudo wget https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-12.6.0-amd64-netinst.iso -P /var/www/html -O debian.iso

DHCP

When configured in proxy dhcp mode: “…dnsmasq simply provides the information given in –pxe-prompt and –pxe-service to allow netbooting”. So only certain settings are available. This is a bit vague, but testing reveals that you must set the boot file name with the dhcp-boot directive, rather than setting it with the more general DHCP option ID 67, for example.

sudo vi /etc/dnsmasq.d/netboot.conf

# Disable DNS
port=0

# Set for DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy

# Respond to clients that use 'HTTPClient' to identify themselves.
dhcp-pxe-vendor=HTTPClient

# Set the boot file name to the web server URL
dhcp-boot="http://netboot/debian.iso"

# PXE-service isn't actually used, but dnsmasq seems to need at least one entry to send the boot file name when in proxy mode.
pxe-service=x86-64_EFI,"Network Boot"

Client

Simply booting the client and selecting UEFI HTTP should be enough. The debian boot loader is signed and works with secure boot.

In addition to ISOs, you can also specify .efi binaries like grubx64.efi. Some distributions support this, though Debian itself may have issues.

Next Steps

You may want to support older clients by adding PXE Boot support.

Troubleshooting

dnsmasq

A good way to see what’s going on is to enable dnsmasq logging.

# Add these to the dnsmasq config file
log-queries
log-dhcp

# Restart and follow to see what's happening
sudo systemctl restart dnsmasq.service
sudo systemctl -u dnsmasq -f

If you’ve enabled logging in dnsmasq and it’s not seeing any requests, you may need to look at your networking. Some virtual environments suppress DHCP broadcasts when they are managing the IP range.

lighttpd

You can also see what’s being requested from the web server if you enable access logs.

cd /etc/lighttpd/conf-enabled
sudo ln -s ../conf-available/10-accesslog.conf
sudo systemctl restart lighttpd.service
sudo cat /var/log/lighttpd/access.log

3.3.1.2 - PXE Boot

Many older systems can’t HTTP Boot so let’s add PXE support with some dnsmasq options.

Installation

Dnsmasq

Install as in the httpboot page.

The Debian Installer

Older clients don’t handle ISOs well, so grab and extract the Debian netboot files.

sudo wget http://ftp.debian.org/debian/dists/bookworm/main/installer-amd64/current/images/netboot/netboot.tar.gz -O - | sudo tar -xzvf - -C /var/www/html

Grub is famous for ignoring proxy dhcp settings, so let’s start off the boot with something else; iPXE. It can do a lot, but isn’t signed so you must disable secure boot on your clients.

sudo wget https://boot.ipxe.org/ipxe.efi -P /var/www/html

Configuration

iPXE

Debian is ready to go, but you’ll want to create an auto-execute file for iPXE so you don’t have to type in the commands manually.

sudo vi /var/www/html/autoexec.ipxe

#!ipxe

set base http://netboot/debian-installer/amd64

dhcp
kernel ${base}/linux
initrd ${base}/initrd.gz
boot

Dnsmasq

HTTP and PXE clients need different information to boot. We handle this by adding a filename to the PXE service option. This will override the dhcp-boot directive for PXE clients.

sudo vi /etc/dnsmasq.d/netboot.conf

# Disable DNS
port=0 
 
# Use in DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy 
 
# Respond to both PXE and HTTP clients
dhcp-pxe-vendor=PXEClient,HTTPClient 
 
# Send the BOOTP information for the clients using HTTP
dhcp-boot="http://netboot/debian.iso" 

# Specify a boot menu option for PXE clients. If there is only one, it's booted immediately.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"

# We also need to enable TFTP for the PXE clients
enable-tftp 
tftp-root=/var/www/html

Client

Both types of client should now work. The debian installer will pull the rest of what it needs from the web.

Next Steps

You can create a boot-menu by adding multiple pxe-service entries in dnsmasq, or by customizing the iPXE autoexec.ipxe files. Take a look at that in the menu page.

Troubleshooting

Text Flashes by, disappears, and client reboots

This is most often a symptom of secure boot still being enabled.

Legacy Clients

These configs are aimed at UEFI clients. If you have old BIOS clients, you can try the pxe-service tag for those.

pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
pxe-service=x86PC,"iPXE (UEFI)", "ipxe.kpxe"

This may not work and there’s a few client flavors so enable the dnsmasq logs to see how they identify themselves. You can also try booting pxelinux as in the Debian docs.

DHCP Options

Dnsmasq also has a whole tag system that you can set and use similar to this:

dhcp-match=set:PXE-BOOT,option:client-arch,7
dhcp-option=tag:PXE-BOOT,option:bootfile-name,"netboot.xyz.efi"

However, dnsmasq in proxy mode limits what you can send to the clients, so we’ve avoided DHCP options and focused on PXE service directives.

Debian Error

*ERROR* CPU pipe B FIFO underrun

You probably need to use the non-free firmware

No Boot option

Try entering the computers bios setup and adding a UEFI boot option for the OS you just installed. You may need to browse for the file \EFI\debian\grubx64.efi

Sources

https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html https://github.com/ipxe/ipxe/discussions/569 https://linuxhint.com/pxe_boot_ubuntu_server/#8

It’s possible to use secure boot if you’re willing to implement a chain of trust. Here’s an example used by FOG to boot devices.

https://forums.fogproject.org/topic/13832/secureboot-issues/3

3.3.1.3 - menu

It would be useful to have some choices when you netboot. You can use the pxe-service built into dnsmasq but a more flexible option is the menu system provided by the iPXE project.

Installation

Set up a http/pxe net-boot server if you haven’t already.

Configuration

dnsmasq

Configure dnsmasq to serve up the ipxe.efi binary for both types of clients.

# Disable DNS
port=0 
 
# Use in DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy 
 
# Tell dnsmasq to provide proxy PXE service to both PXE and HTTP clients
dhcp-pxe-vendor=PXEClient,HTTPClient 
 
# Send the BOOTP information for the clients using HTTP
dhcp-boot="http://netboot/ipxe.efi" 

# Specify a boot menu option for PXE clients. If there is only one, it's booted immediately.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
  
# We also need to enable TFTP for the PXE clients  
enable-tftp 
tftp-root=/var/www/html

Change the autoexec.ipxe to display a menu.

sudo vi /var/www/html/autoexec.ipxe

#!ipxe

echo ${cls}

:MAIN
menu Local Netboot Menu
item --gap Local Network Installation
item WINDOWS ${space} Windows 11 LTSC Installation
item DEBIAN ${space} Debian Installation
choose selection && goto ${selection} || goto ERROR

:WINDOWS
echo Some windows things here
sleep 3
goto MAIN

:DEBIAN
dhcp
imgfree
set base http://netboot/debian-installer/amd64
kernel ${base}/linux 
initrd ${base}/initrd.gz
boot || goto ERROR


:ERROR
echo There was a problem with the selection. Exiting...
sleep 3
exit

Operation

You’ll doubtless find additional options to add. You may want to add the netboot.xyz project to your local menu too.

3.3.1.4 - netboot.xyz

You can add netboot.xyz to your iPXE menu to run Live CDs, OS installers and utilities they provide. This can save a lot of time and their list is always improving.

Installation

You’re going to connect to the web for this, so there’s nothing to install. You can download their efi bootloader manually if you’d like to keep things HTTPS, but they update it regularly so you may fall behind.

Configuration

Autoexec.ipxe

Add a menu item to your autoexec.ipxe. When you select it, iPXE will chainload (in their parlance) the netboot.xyz bootloader.

#!ipxe

echo ${cls}

:MAIN
menu Local Netboot Menu
item --gap Local Network Installation
item WINDOWS ${space} Windows 11 LTSC Installation
item DEBIAN ${space} Debian Installation
item --gap Connect to Internet Sources
item NETBOOT ${space} Netboot.xyz
choose selection && goto ${selection} || goto ERROR

:WINDOWS
echo Some windows things here
sleep 3
goto MAIN

:DEBIAN
dhcp
imgfree
set base http://netboot/debian-installer/amd64
kernel ${base}/linux 
initrd ${base}/initrd.gz
boot || goto ERROR

:NETBOOT
dhcp
chain --autofree http://boot.netboot.xyz || goto ERROR

:ERROR
echo There was a problem with the selection. Exiting...
sleep 3
exit

Local-vars

Netboot.xyz detects that it’s working with a Proxy PXE server and behaves a little differently. For example, you can’t insert your own local menu.ipxe. One helpful addition is a local settings file to speed up boot.

sudo vi /var/www/html/local-vars.ipxe

#!ipxe
set use_proxydhcp_settings true

Operation

You can choose the new menu item and load netboot.xyz. It will take you out the web for more selections. Not everything will load on every client, of course. But it gives you a lot of options.

Next Steps

We glossed over how to install Windows. That’s a useful item.

Troubleshooting

Wrong TFTP Server

tftp://192.168.0.1/local-vars.ipxe....Connection timed out
Local vars file not found... attempting TFTP boot...
DHCP proxy detected, press p to boot from 192.168.0.2...

If your boot client is attempting to connect to the main DHCP server, that server is probably sending value next server: 192.168.0.1 in it’s packets. This isn’t a DNS option per say, but it affects netboot. Dnsmasq does this though Kea doesn’t.

sudo systemctl -u dnsmasq -f

...
...
next server: 192.168.0.1
...
...

The boot still works, it’s just annoying. You can usually ignore the message and don’t have to hit ‘p’.

Exec Format Error

Could not boot: Exec format error (https://ipxe.org/2e008081)

You may see this flash by. Check your menus and local variables file to make sure you’ve in included the #!pxe shebang.

No Internet

You can also host your own local instance.

3.3.1.5 - windows

To install windows, have iPXE load wimboot then WinPE. From there you can connect to a samba share and start the Windows installer. Just like back in the gold-ole administrative installation point days.

Getting a copy of WinPE the official way is a bit of a hurdle, but definitely less work than setting up a full Windows imaging solution.

Installation

Samba and Wimboot

On the netboot server, install wimboot and Samba.

sudo wget https://github.com/ipxe/wimboot/releases/latest/download/wimboot -P /var/www/html
sudo apt install samba

Window ADK

On a Windows workstation, download the ADK and PE Add-on and install as per Microsoft’s ADK Install Doc.

Configuration

Samba

Prepare the netboot server to receive the Windows files.

sudo vi /etc/samba/smb.conf

[global]
  map to guest = bad user
  log file = /var/log/samba/%m.log

[install]
  path = /var/www/html
  browseable = yes
  read only = no
  guest ok = yes
  guest only = yes

sudo mkdir /var/www/html/winpe
sudo mkdir /var/www/html/win11
sudo chmod o+w /var/www/html/win*
sudo systemctl restart smbd.service

Window ADK Config

On the Windows workstation, start the deployment environment as an admin and create the working files as below. More info is in Microsoft’s Create Working Files document.

Start -> All Apps -> Windows Kits -> Deployment and Imaging Tools Environment (Right Click, More, Run As Admin)

copype amd64 c:\winpe\amd64

Add the required additions for Windows 11 with the commands below. These are the optional components WinPE-WMI and WinPE-SecureStartup and more info is in Microsoft’s Customization Section.

mkdir c:\winpe\offline

dism /mount-Image /Imagefile:c:\winpe\amd64\media\sources\boot.wim /index:1 /mountdir:c:\winpe\offline

dism /image:c:\winpe\offline /add-package /packagepath:"..\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-WMI.cab" /packagepath:"..\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-SecureStartup.cab"

dism /unmount-image /mountdir:c:\winpe\offline /commit

Make the ISO in case you want to HTTP Boot from it later and keep the shell open for later.

MakeWinPEMedia /ISO C:\winpe\amd64 C:\winpe\winpe_amd64.iso

WinPE

Now that you’ve got a copy of WinPE, copy it to the netboot server.

net use q: \\netboot\install
xcopy /s c:\winpe\* q:\winpe

Also create some auto-start files for setup. The first is part to the WinPE system and tells it (generically) what to do after it starts up.

notepad q:\winpe\amd64\winpeshl.ini

[LaunchApps]
"install.bat"

This the second is more specific and associated with the thing you are installing. We’ll mix and match these in the PXE menu later so we can install different things.

notepad q:\win11\install.bat

wpeinit
net use \\netboot
\\netboot\install\win11\setup.exe
pause

Win 11

You also need to obtain the latest ISO and extract the contents.

https://massgrave.dev/windows_ltsc_links
Double-click on the ISO
Copy contents to q:\win11

Wimboot

Bck on the netboot server, customize the WINDOWS section of your autoexex.ipxe like this.

:WINDOWS
dhcp
imgfree
set winpe http://netboot/winpe/amd64
set source http://netboot/win11
kernel wimboot
initrd ${winpe}/media/sources/boot.wim boot.wim
initrd ${winpe}/media/Boot/BCD         BCD
initrd ${winpe}/media/Boot/boot.sdi    boot.sdi
initrd ${winpe}/winpeshl.ini           winpeshl.ini
initrd ${source}/install.bat           install.bat
boot || goto MAIN

You can add other installs by copying this block and changing the :WINDOWS header and source variable.

Next Steps

Add some more installation sources and take a look at the Windows zero touch install.

Troubleshooting

System error 53 has occurred. The network path was not found

A given client may be unable to connect to the SMB share, or it may fail once, but then connect on a retry a moment later. I suspect it’s because the client doesn’t have an IP yet, though I’ve not looked at it closely. You can usually just retry.

You can also comment out the winpeshl.ini line and you’ll boot to a command prompt that will let you troubleshoot. Sometimes you just don’t have an IP yet from the DHCP server and you can edit the install.bat file to add a sleep or other things. See then [zero touch deployment] page for some more ideas.

Access is denied

This may be related to the executable bit. If you’ve copied from the ISO they should be set. But if after that you’ve changed anything you could have lost the x bit from setup.exe. It’s hard to know what’s supposed to be set once it’s gone, so you may want to recopy the files.

3.3.2 - Windows

3.3.2.1 - Server Core

Installation Notes

If you’re deploying Windows servers, Server Core is best practice¹. Install from USB and it will offer that as a choice - it’s fairly painless. But these instances are designed to be remote-managed so you’ll need to perform a few post-install tasks to help with that.

Server Post-Installation Tasks

Set a Manual IP Address

The IP is DHCP by default and that’s fine if you create a reservation at the DHCP server or just use DNS. If you require a manual address, however:

# Access the PowerShell interface (you can use the server console if desired)

# Identify the desired interface's index number. You'll see multiple per adapter for IP4 and 6 but the interface index will repeat.
Get-NetIPInterface

# Set a manual address, netmask and gateway using that index (12 in this example)
New-NetIPaddress -InterfaceIndex 12 -IPAddress 192.168.0.2 -PrefixLength 24 -DefaultGateway 192.168.0.1

# Set DNS
Set-DNSClientServerAddress –InterfaceIndex 12 -ServerAddresses 192.168.0.1

Allow Pings

This is normally a useful feature, though it depends on your security needs.

Set-NetFirewallRule -Name FPS-ICMP4-ERQ-In -Enabled True

Allow Computer Management

Server core allows ‘Remote Management’ by default². That is specifically the Server Manager application that ships with Windows Server versions and is included with the Remote Server Admin Tools on Windows 10 professional³ or better. For more detailed work you’ll need to use the Computer Management feature as well. If you’re all part of AD, this is reported to Just Work(TM). If not, you’ll need to allow several ports for SMB and RPC.

# Port 445
Set-NetFirewallRule -Name FPS-SMB-In-TCP -Enabled True

# Port 135
Set-NetFirewallRule -Name WMI-RPCSS-In-TCP -Enabled True


maybe 
FPS-NB_Name-In-UDP
NETDIS-LLMNR-In-UDP

Configuration

Remote Management Client

If you’re using windows 10/11, install it on a workstation by going to System -> Optional features -> View features and enter Server Manager in the search box to select and install.

With AD

When you’re all in the same Domain then everything just works (TM). Or so I’ve read.

Without AD

If you’re not using Active Directory, you’ll have to do a few extra steps before using the app.

Trust The Server

Tell your workstation you trust the remote server you are about to manage⁴ (yes, seems backwards). Use either the hostname or IP address depending on how your planning to connect - i.e. if you didn’t set up DNS use IPs. Start an admin powershell and enter:

Set-Item wsman:\localhost\Client\TrustedHosts 192.168.5.1 -Concatenate -Force

Add The Server

Start up Server Manager and select Manage -> Add Servers -> DNS and search for the IP or DNS name. Pay attention the server’s name that it detects. If DNS happens to reslove the IP address you put in, as server-1.local for example, you’ll need to repeat the above TrustedHosts command with that specific name.

Manage As…

You may notice that after adding the server, the app tries to connect and fails. You’ll need to right-click it and select Manage As… and enter credentials in the form of server-1\Administrator and select Remember me to have this persist. Here you’ll need to use the actual server name and not the IP. If unsure, you can get this on the server with the hostname command.

Starting Performance Counters

The server you added should now say that it’s performance counters are not started. Right-click to and you can select to start them. The server should now show up as Online and you can perform some basic tasks.

server-1.local\Administrator

Server Manager is the default management tool and newer servers allow remote management by default. The client needs a few things, however.

Set DNS so you can resolve by names
Configure Trusted Hosts

On the system where you start the the Server Manager app - usually where you are sitting - ensure you can resolve the remote host via DNS. You may want to edit your hosts file if not.

notepad c:\Windows\System32\drivers\etc\hosts

You can now add the remote server.

Manage -> Add Servers -> DNS -> Search Box (enter the other servers hostname) -> Magnifying Glass -> Select the server -> Right Arrow Icon -> OK

(You man need to select Manage As on it)

Allow Computer Management

You can right-click on a remote server and select Computer Management after doing this

MISC

Set-NetFirewallProfile -Profile Domain, Public, Private -Enabled False

3.3.2.2 - Windows Zero Touch Install

The simplest way to zero-touch install Windows is with a web-generated answer file. Go to a site like schneegans and just create it. This removes the need for the complexity of MDS WDS SCCM etc. systems for normal deployments.

Create An Answer File

Visit schneegans. Start with some basic settings, leaving most at the default, and increase complexity with successive iterations. A problematic setting will just dump you out of the installer and it can be hard to determine what went wrong.

Download the file and use it one of the following ways;

USB

After creating the USB installer, copy the file (autounattend.xml) to the root of the USB drive (or one of these locations) and setup will automatically detect it.

Netboot

For a netboot install, copy the file to the sources folder of the Windows files.

scp autounattend.xml netboot:/var/www/html/win11/sources

Additionally, some scripting elements of the install don’t support UNC paths so we must map a drive. Back in the Windows netboot page, we created an install.bat to start the installation. Let’s modify that like so

vi /var/www/html/win11/install.bat

wpeinit

SET SERVER=netboot

:NET
net use q: \\%SERVER%\install

REM If there was a problem with the net use command, 
REM ping, pause and loop back to try again

IF %ERRORLEVEL% NEQ 0 (
  ping %SERVER%
  pause
  GOTO NET
) ELSE (
  q:
  cd win11
  setup.exe
)

Add Packages

The installer can also add 3rd party software packages by adding commands in the Run custom scripts section to run at initial log-in. We’ll use HTTP to get the files as some versions of windows block anonymous SMB.

Add Package Sources

On the netboot server, create an apps folder for your files and download packages there.

mkdir /var/www/html/apps; cd /var/www/html/apps
wget https://get.videolan.org/vlc/3.0.9.2/win64/vlc-3.0.9.2-win64.msi 
wget https://statics.teams.cdn.office.net/production-windows-x64/enterprise/webview2/lkg/MSTeams-x64.msix

Add to Autounattend.xml

It’s easiest to add this in the web form rather than try and edit the XML file. Go to this section and add a line like this one to the third block of custom scripts. It must run at initial user login as the network isn’t available before that.

Navigate to the block that says:

Scripts to run when the first user logs on after Windows has been installed

For MSI Files

These and handled as .cmd files as in field 1.

msiexec /package http://netboot/apps/GoogleChromeStandaloneEnterprise64.msi /quiet
msiexec /package http://netboot/apps/vlc-3.0.9.2-win64.msi /quiet

For MSIX Files

These are handled as .ps1 files as in field 2.

Add-AppPackage -path http://netboot/apps/MSTeams-x64.msix

For EXE files

These are are also handled in the .ps1 files in field 2. They require more work however, as you must download, run, then remove them.

(New-Object System.Net.WebClient).DownloadFile("http://netboot/apps/WindowsSensor.MaverickGyr.exe","$env:temp\crowd.exe")
Start-Process $env:temp\crowd.exe -ArgumentList "/install /quiet CID=239023847023984098098" -wait
Remove-Item "$env:temp\crowd.exe"

Troubleshooting

Select Image Screen

Specifying the KMS product key won’t always allow you to skip the “Select Image” screen. This may be due to an ISO being pre-licensed or have something to do with Windows releases. To fix this, add an InstallFrom stanza to the OSImage block of your unattended.xml file.


                        <ImageInstall> 
                                <OSImage> 
                                        <InstallTo> 
                                                <DiskID>0</DiskID> 
                                                <PartitionID>3</PartitionID> 
                                        </InstallTo> 
                                        <InstallFrom> 
                                                <MetaData wcm:action="add"> 
                                                        <Key>/Image/Description</Key> 
                                                        <Value>Windows 11 Enterprise</Value> 
                                                </MetaData> 
                                        </InstallFrom> 
                                </OSImage> 
                        </ImageInstall>

https://www.tenforums.com/installation-upgrade/180022-autounattend-no-product-key.html

Notes

Windows Product Keys https://gist.github.com/rvrsh3ll/0810c6ed60e44cf7932e4fbae25880df

3.4 - Virtualization

In the beginning, users time-shared CPUs and virtualization was without form and void. And IBM said “Let there be System/370”. This was in the 70’s and involved men with crew-cuts, horn-rimmed glasses and pocket protectors. And ties.

Today, you can still do full virtualization. Everything is emulated down to the hardware and every system has it’s own kernel and device drivers. Most of the public cloud started out this way at the dawn of the new millennium. It was the way. VMWare was the early player in this area and popularized it on x86 hardware where everyone was using 5% of their pizzabox servers.

The newer way is containerization. There is just one kernel and it keeps groups processes separate from each other. This is possible because Linux implemented kernel namespaces around 2008 - mostly work by IBM, suitably enough. The program used to work with this is named LXC and you’d use commands like sudo lxc-create --template download --name u1 --dist ubuntu --release jammy --arch amd64. Other systems such as LXD and Docker (originally) are layed on top to provide more management.

Twenty some years later, what used to be a hot market is now a commodity that’s essentially given away for free. VMWare was acquired by Broadcom who’s focused on the value-extraction phase of it’s lifecycle and the cloud seems decidedly headed toward containers because of it’s better efficiency and agility.

3.4.1 - Incus

Inucs is a container manager, forked from Canonical’s LXD manager. It combines all the virtues of upstream LXD (containers + vms) with the advantages of community driven additions. You have access to the containers provided by the OCI (open container initiative) as well as being able to create VMs. It is used at the command line and includes a web interface.

Installation

Simply install a base OS on your server and add a few commands. You can install from your distro’s repo, but zabbly (the sponsor) is a bit newer.

As per https://github.com/zabbly/incus

sudo mkdir -p /etc/apt/keyrings/
sudo wget -O /etc/apt/keyrings/zabbly.asc https://pkgs.zabbly.com/key.asc

sudo sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc

EOF'

sudo apt update
sudo apt install -y incus incus-ui-canonical

Configuration

sudo adduser YOUR-USERNAME incus-admin
incus admin init

You’re fine to accept the defaults, though if you’re planning on a cluster consult

https://linuxcontainers.org/incus/docs/main/howto/cluster_form/#cluster-form

Managing Networks

Incus uses managed networks. It creates a private bridged network by default with DHCP, DNS and NAT services. You can create others and it add services similarly. You don’t plug instances in, rather you create a new profile with no network and configure the instance with that profile.

If you’re testing DHCP though, such as when working with netboot, you must create a network without those services. That must be done at the command line with the IP spaces set to none. You can then use that in a profile

incus network create test ipv4.address=none ipv6.address=none
incus profile copy default isolated

You can proceed to the GUI for the rest.

Operation

Windows 11 VM Creation

This requires access to the TPM module and an example at the command line is extracted from https://discussion.scottibyte.com/t/windows-11-incus-virtual-machine/362.

After repacking the installation ISO you can also create through the GUI and add:

incus config device add win11vm vtpm tpm path=/dev/tpm0

Agent

sudo apt install lxd-agent

Notes

LXD is widely admired, but Canonical’s decision to move it to in-house-only led the lead developer and elements of the community to fork.

3.4.2 - Proxmox PVE

Proxmox PVE is a distro from the company Proxmox that makes it easy to manage manage containers and virtual machines. It’s built on top of Debian and allows a lot of customization. This can be good and bad compared to VMWare or XCP-NG that keep you on the straight and narrow. But it puts the choice in your hands.

Installation

Initial Install

Download the ISO and make a USB installer. It’s a hybrid image so you can write it directly to a USB drive.

sudo dd if=Downloads/proxmox*.iso of=/dev/sdX bs=1M conv=fdatasync

EUFI boot works fine. It will set a static IP address durning the install so be prepared for that.

If your system has an integrated NIC, check out the troubleshooting section after installing for potential issues with RealTek hardware.

System Update

After installation has finished and the system rebooted, update it. If you skip this step you may have problems with containers.

# Remove the pop-up warning if desired
sed -Ezi.bak "s/(function\(orig_cmd\) \{)/\1\n\torig_cmd\(\);\n\treturn;/g" /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js && systemctl restart pveproxy.service

# Remove the enterprise subscription repos
rm /etc/apt/sources.list.d/pve-enterprise.list
rm /etc/apt/sources.list.d/ceph.list

# Add the non-subscription PVE repo
. /etc/os-release 
echo "deb http://download.proxmox.com/debian/pve $VERSION_CODENAME pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list

# Add the non-subscription Ceph repo - this will change so consult 
# Check in your browser --> https://pve.proxmox.com/wiki/Package_Repositories
echo "deb http://download.proxmox.com/debian/ceph-reef bookworm no-subscription" > /etc/apt/sources.list.d/ceph-no-subscription.list

# Alternately, here's a terrible way to get the latest ceph release
LATEST=$(curl https://enterprise.proxmox.com/debian/ | grep ceph | sed 's/.*>\(ceph-.*\)\/<.*\(..-...-....\) .*/\1,\2/' | sort -t- -k3,3n -k2,2M -k1,1n | tail -1 | cut -f 1 -d ",")

echo "deb http://download.proxmox.com/debian/$LATEST $VERSION_CODENAME no-subscription" > /etc/apt/sources.list.d/ceph-no-subscription.list

# Update, upgrade and reboot
apt update
apt upgrade -y
reboot

Container Template Update

The template list is updated on a schedule, but you can get a jump on it while you’re logged in. More information at:

https://pve.proxmox.com/wiki/Linux_Container#pct_container_images

pveam update
pveam available
pveam download local (something from the list)

Configuration

Network

The default config is fine for most use-cases and you can skip the Network section. If you’re in a larger environment, you may want to employ an overlap network, or use VLANs.

The Default Config

PVE creates bridge interface named vmbr0 and assigns a management IP there. As containers and VMs come up, their virtual interfaces will be connected to this bridge so they can have their own MAC addresses.

You can see this in the GUI or in the traditional Debian interfaces file.

cat /etc/network/interfaces

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.11/24
    gateway 192.168.100.1
    bridge-ports enp1s0
    bridge-stp off
    bridge-fd 0

Create an Overlap Network

You may want to add some additional LANs for your guests, or to separate your management from the rest of the network. You can do this by simply adding some additional LAN addresses.

After changing IPs, take a look further down at how to restrict access.

Mixing DHCP and Static Addresses

To add additional DHCP IPs, say because you get a DHCP address from the wall but don’t want PVE management on that, use the up directive. In this example the 192 address is LAN only and the gateway comes from DHCP.

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.11/24
    up dhclient vmbr0
    bridge-ports enp1s0
    bridge-stp off
    bridge-fd 0

Adding Additional Static Addresses

You should¹ use the modern debian method.

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.11/24
    bridge-ports enp1s0
    bridge-stp off
    bridge-fd 0    

iface vmbr0 inet static
    address 192.168.64.11/24
    gateway 192.168.64.1

Adding VLANs

To can add VLANS in the /etc/network/interfaces file as well.

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet manual
    bridge-ports enp1s0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr0.1337
iface vmbr0.1337 inet static
    address 10.133.7.251/24
    gateway 10.133.7.1

auto vmbr0.1020
iface vmbr0.1020 inet static
    address 10.20.146.14/16

Restricting Access

You can use the pvefirewall or the pveproxy settings. There’s an anti-lockout rule on the firewall however, that requires an explicit block, so you may prefer to set controls on the proxy.

PVE Proxy

The management web interface listens on all addresses by default. You can change that here. Other services, such as ssh, remain the same.

vi /etc/default/pveproxy

LISTEN_IP="192.168.32.11"

You can also combine or substitute a control list. The port will still accept connections, but the application will reset them.

ALLOW_FROM="192.168.32.0/24"
DENY_FROM="all"
POLICY="allow"

pveproxy restart

Accessing Data

Container Bind Mounts for NFS

VMs and Containers work best when they’re light-weight and that means saving data somewhere else, like a NAS. Containers are the lightest of all but using NFS in a container causes a security issue.

Instead, mount on the host and bind-mount to the container with mp.

vi /etc/pve/lxc/100.conf

# Add this line.
#  mount point ID: existing location on the server, location to mount inside the guest
mp0: /mnt/media,mp=/mnt/media,shared=1
#mp1: and so on as you need more.

User ID Mapping

This next thing you’ll notice is that users inside the containers don’t match users outside. That’s because they’re shifted for security. To get them to line up you need a map.

# In the host, edit these files to allow root, starting at 1000, to map the next 11 UIDs and GIDs (in addition to what's there already)

# cat /etc/subuid
root:1000:11
root:100000:65536

# cat /etc/subgid
root:1000:11
root:100000:65536

# Also on the host, edit the container's config 
vi /etc/pve/lxc/100.conf

# At the bottom add these

# By default, the container users are shifted up by 100,000. Keep that in place for the first 1000 with this section 

## Starting with uid 0 in the container, map it to 100000 in the host and continue mapping for 1000 entries. (users 0-999)
lxc.idmap = u 0 100000 1000
lxc.idmap = g 0 100000 1000

# Map the next 10 values down low so they match the host (10 is just an arbitrary number. Map as many or as few as you need)

## Starting in the container at uid 1000, jump to 1000 in the host and map 10 values. (users 1000-1009)
lxc.idmap = u 1000 1000 10
lxc.idmap = g 1000 1000 10

# Then go back to mapping the rest up high
## Starting in the container at uid 1010, map it 101010 and continue for the next 64525 entries (65535 - 1010)
lxc.idmap = u 1010 101010 64525
lxc.idmap = g 1010 101010 64525

Fixing User ID Mapping

If you want to add mapping to an existing container, user IDs are probably already in place and you’ll have to adjust them. Attempts to do so in the container will result in a permission denied, even as root. Mount them in the PVE host and change them there.

pct mount 119
# For a user ID number of 1000
find /var/lib/lxc/119/rootfs -user 101000 -exec chown -h 1000 {} \;
find /var/lib/lxc/119/rootfs -group 101000 -exec chgrp -h 1000 {} \; 
pct unmount 119

Retro-fitting a Service

Sometimes, you have to change a service to match between different containers. Log into your container and do the following.

# find the service's user account, make note of it and stop the service
ps -ef 
service someService stop
 
# get the exiting uid and gid, change them and change the files
id someService

 > uid=112(someService) gid=117(someService) groups=117(someService),117(someService)

usermod -u 1001 someService
groupmod -g 1001 someService

# Change file ownership -xdev so it won't traverse remote volumes 
find / -xdev -group 1001 -exec chgrp -h someService {} \;
find / -xdev -user 1001 -exec chown -h someService {} \;

Clustering

Edit the /etc/hosts file to ensure that the IP address reflects any changes you’ve made (such as the addition on a specific management address). Ideally, add the hostname and IP of all of the impending cluster members and ensure they can all ping each other by that name.

The simple way to create and add members is done at the command line.

# On the first cluster member
pvecm create CLUSTERNAME

# On the other members
pvecm add FIRST-NODE-HOSTNAME

You can also refer to the notes at:

https://pve.proxmox.com/wiki/Cluster_Manager

Operation

Web Gui

You can access the Web GUI at:

https://192.168.32.10:8006

Logging Into Containers

https://forum.proxmox.com/threads/cannot-log-into-containers.39064/

pct enter 100

Troubleshooting

When The Container Doesn’t start

You may want to start it in foreground mode to see the error up close

lxc-start -n ID -F -l DEBUG -o /tmp/lxc-ID.log

Repairing Container Disks

Containers use LVM by default. If it fails to start and you suspect a disk error, you can fsck it. You can access the content as well. There are also direct ways² if these fail.

pct list
pct fsck 108
pct mount 108

Network Drops

If your PVE server periodically drops the network with an error message about the realtek firmware, consider updating the driver.

# Add the non free and firmware to the apt source main line
sed -i '/bookworm main contrib/s/$/ non-free non-free-firmware/' /etc/apt/sources.list
apt update

# Install the kernel headers and the dkms driver.
apt -y install linux-headers-$(uname -r)
apt install r8168-dkms

Combining DHCP and Static The Normal Way Fails

You can’t do this the normal Debian way it seems. In testing, the bridge doesn’t accept mixing types directly. You must use the ip command.

Cluster Addition Failure

PVE local node address: cannot use IP not found on local node! 500 Can’t connect to XXXX:8006 (hostname verification failed)

Make sure the hosts files on all the nodes match and they can ping each other by hostname. Use hostnames to add cluster members, not IPs.

Sources

https://forum.proxmox.com/threads/proxmox-host-is-getting-unavailable.125416/ https://www.reddit.com/r/Proxmox/comments/10o58uq/how_to_install_r8168dkms_package_on_proxmox_ve_73/ https://wiki.archlinux.org/title/Dynamic_Kernel_Module_Support https://pve.proxmox.com/wiki/Unprivileged_LXC_containers https://pve.proxmox.com/wiki/Unprivileged_LXC_containers#Using_local_directory_bind_mount_points https://www.reddit.com/r/homelab/comments/6p3xdw/proxmoxlxc_mount_host_folder_in_an_unprivileged/

4 - Security

4.1 - CrowdSec

4.1.1 - Installation

Overview

CrowdSec has two main parts; detection and interdiction.

Detection is handled by the main CrowdSec binary. You tell it what files to keep an eye on, how to parse those files, and what something ‘bad’ looks like. It then keeps a list of IPs that have done bad things.

Interdiction is handled by any number of plugins called ‘bouncers’, so named because they block access or kick out bad IPs. They run independently and keep an eye on the list, to do things like edit the firewall to block access for a bad IP.

There is also the ‘crowd’ part. The CrowdSec binary downloads IPs of known bad-actors from the cloud for your bouncers to keep out and submits alerts from your systems.

Installation

With Debian, you can simply add the repo via their script and install with a couple lines.

curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables

This installs both the detection (crowdsec) and the interdiction (crowdsec-firewall-bouncer) parts. Assuming eveything went well, crowdsec will check in with the cloud, download a baseline list of known bad-actors, the firewall-bouncer will set up a basic drop list in the firewall, and crowdsec will start watching your syslog for intrusion attempts.

# Check out the very long drop list
sudo nft list ruleset | less

Note - if there are no rules, you may need to sudo systemctl restart nftables.service or possibly reboot (as I’ve found in testing)

Configuration

CrowdSec comes pre-configured to watch for ssh brute-force attacks. If you have specific services to watch you can add those as described below.

Add a Service

You probably want to watch a specific service, like web server. Take a look at [https://hub.crowdsec.net/] to see all the available components. For example, browse the collections and search for caddy. The more info link will show you how to install the collection;

sudo cscli collections list -a
sudo cscli collections install crowdsecurity/caddy

Tell CrowdSec where Caddy’s log files are.

sudo tee -a /etc/crowdsec/acquis.yaml << EOF

---
filenames:
 - /var/log/caddy/*.log
labels:
  type: caddy
---
EOF

Restart crowdsec for these changes to take effect

sudo systemctl reload crowdsec

Operation

DataFlow

CrowdSec works by pulling in data from the Acquisition files, Parsing the events, comparing to Scenarios, and then Deciding if action should be taken.

Acquisition of data from log files is based on entries in the acquis.yaml file, and the events given a label as defined in that file.

Those events feed the Parsers. There are a handful by default, but only the ones specifically interested in a given label will see it. They look for keywords like ‘FAILED LOGIN’ and then extract the IP.

Successfully parsed lines are feed to the Scenarios to if what happened matters. The scenarios look for things like 10 FAILED LOGINs in 1 min. This separates the accidental bad password entry from a brute force attempt.

Matching a scenario gets the IP added to the Decision List, i.e the list of bad IPs. These have a configurable expiration, so that if you really guess wrong 10 times in a row, you’re not banned forever.

The bouncers use this list to take action, like a firewall block, and will unblock you after the expiration.

Collections

Parsers and Scenarios work best when they work together so they are usually distributed together as a Collection. You can have collections of collections as well. For example, the base installation comes with the linux collection that includes a few parsers and the sshd collection.

To see what Collections, Parsers and Scenarios are running, use the cscli command line interface.

sudo cscli collections list
sudo cscli collections inspect crowdsecurity/linux
sudo cscli collections inspect crowdsecurity/sshd

Inspecting the collection will tell you what parsers and scenarios it contains. As well as some metrics. To learn more a collection and it’s components, you can check out their page:

https://hub.crowdsec.net/author/crowdsecurity/collections/linux

The metrics are a bit confusing until you learn that the ‘Unparsed’ column doesn’t mean unparsed so much as it means a non-event. These are just normal logfile lines that don’t have one of the keywords the parser was looking for, like ‘LOGIN FAIL’.

Status

Is anyone currently attacking you? The decisions list shows you any current bad actors and the alerts list shows you a summary of past decisions. If you are just getting started this is probably none, but if you’re open to the internet this will grow quickly.

sudo cscli decisions list
sudo cscli alerts list

But you are getting events from the cloud and you can check those with the -a option. You’ll notice that every 2 hours the community-blocklist is updated.

sudo cscli alerts list -a

After a while of this collection running, you’ll start to see these kinds of alerts

sudo cscli alerts list
╭────┬───────────────────┬───────────────────────────────────────────┬─────────┬────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │       value       │                  reason                   │ country │           as           │ decisions │               created_at                │
├────┼───────────────────┼───────────────────────────────────────────┼─────────┼────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 27 │ Ip:18.220.128.229 │ crowdsecurity/http-bad-user-agent         │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.948429492 +0000 UTC │
│ 26 │ Ip:18.220.128.229 │ crowdsecurity/http-path-traversal-probing │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.979479713 +0000 UTC │
│ 25 │ Ip:18.220.128.229 │ crowdsecurity/http-probing                │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.9460075 +0000 UTC   │
│ 24 │ Ip:18.220.128.229 │ crowdsecurity/http-sensitive-files        │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.945759433 +0000 UTC │
│ 16 │ Ip:159.223.78.147 │ crowdsecurity/http-probing                │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2023-03-01 23:03:06.818512212 +0000 UTC │
│ 15 │ Ip:159.223.78.147 │ crowdsecurity/http-sensitive-files        │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2023-03-01 23:03:05.814690037 +0000 UTC │
╰────┴───────────────────┴───────────────────────────────────────────┴─────────┴────────────────────────┴───────────┴─────────────────────────────────────────╯

You may even need to unblock yourself

sudo cscli decisions list
sudo cscli decision delete --id XXXXXXX

Next Steps

You’re now taking advantage of the crowd-part of the crowdsec and added your own service. If you don’t have any alerts though, you may be wondering how well it’s actually working.

Take a look at the detailed activity if you want to look more closely at what’s going on.

4.1.2 - Detailed Activity

Inspecting Metrics

Data comes in through the parsers. To see what they are doing, let’s take a look at the Acquisition and Parser metrics.

sudo cscli metrics

Most of the ‘Acquisition Metrics’ lines will be read and unparsed. This is because normal events are dropped. It only considers lines parsed if they were passed on to a scenario. The ‘bucket’ column refers to event scenarios and is also blank as there were no parsed lines to hand off.

Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│         Source         │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 216        │ -            │ 216            │ -                      │
│ file:/var/log/syslog   │ 143        │ -            │ 143            │ -                      │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

The ‘Parser Metrics’ will show the individual parsers - but not all of them. Only parsers that have at least one ‘hit’ are shown. In this example, only the syslog parser shows up. It’s a low-level parser that doesn’t look for matches, so every line is a hit.

Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│             Parsers             │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/syslog-logs │ 359  │ 359    │ -        │
│ crowdsecurity/syslog-logs       │ 359  │ 359    │ -        │
╰─────────────────────────────────┴──────┴────────┴──────────╯

However, try a couple failed SSH login attempts and you’ll see them and how they feed up the the Acquisition Metrics.


Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│         Source         │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 242        │ 3            │ 239            │ -                      │
│ file:/var/log/syslog   │ 195        │ -            │ 195            │ -                      │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│             Parsers             │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/sshd-logs   │ 61   │ 3      │ 58       │
│ child-crowdsecurity/syslog-logs │ 442  │ 442    │ -        │
│ crowdsecurity/dateparse-enrich  │ 3    │ 3      │ -        │
│ crowdsecurity/geoip-enrich      │ 3    │ 3      │ -        │
│ crowdsecurity/sshd-logs         │ 8    │ 3      │ 5        │
│ crowdsecurity/syslog-logs       │ 442  │ 442    │ -        │
│ crowdsecurity/whitelists        │ 3    │ 3      │ -        │
╰─────────────────────────────────┴──────┴────────┴──────────╯

Lines poured to bucket however, is still empty. That means the the action didn’t match a scenario defining a hack attempt. In fact - you may notice the ‘whitelist` was triggered. Let’s ask crowdsec to explain what’s going on.

Detailed Parsing

To see which parsers got involved and what they did, you can ask.

sudo cscli explain --file /var/log/auth.log --type syslog

Here’s a ssh example of a failed login. The numbers, such as (+9 ~1), mean that the parser added 9 elements it parsed from the raw event, and updated 1. Notice the whitelists parser at the end. It’s catching this event and dropping it, hence the ‘parser failure’. The failure message is a red herring, as this is how it’s supposed to work. It short-circuits as soon as it thinks something should be white-listed.

line: Mar  1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16  user=allen
        ├ s00-raw
        |       └ 🟢 crowdsecurity/syslog-logs (first_parser)
        ├ s01-parse
        |       └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
        ├ s02-enrich
        |       ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
        |       ├ 🟢 crowdsecurity/geoip-enrich (+9)
        |       └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
        └-------- parser failure 🔴

But why exactly did it get whitelisted? Let’s ask for a verbose report.

sudo cscli explain -v --file /var/log/auth.log --type syslog

line: Mar  1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16  user=someGuy
        ├ s00-raw
        |       └ 🟢 crowdsecurity/syslog-logs (first_parser)
        ├ s01-parse
        |       └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
        |               └ update evt.Stage : s01-parse -> s02-enrich
        |               └ create evt.Parsed.sshd_client_ip : 192.168.1.16
        |               └ create evt.Parsed.uid : 0
        |               └ create evt.Parsed.euid : 0
        |               └ create evt.Parsed.pam_type : unix
        |               └ create evt.Parsed.sshd_invalid_user : someGuy
        |               └ create evt.Meta.service : ssh
        |               └ create evt.Meta.source_ip : 192.168.1.16
        |               └ create evt.Meta.target_user : someGuy
        |               └ create evt.Meta.log_type : ssh_failed-auth
        ├ s02-enrich
        |       ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
        |               ├ create evt.Enriched.MarshaledTime : 2023-03-01T14:08:11Z
        |               ├ update evt.MarshaledTime :  -> 2023-03-01T14:08:11Z
        |               ├ create evt.Meta.timestamp : 2023-03-01T14:08:11Z
        |       ├ 🟢 crowdsecurity/geoip-enrich (+9)
        |               ├ create evt.Enriched.Longitude : 0.000000
        |               ├ create evt.Enriched.ASNNumber : 0
        |               ├ create evt.Enriched.ASNOrg : 
        |               ├ create evt.Enriched.ASNumber : 0
        |               ├ create evt.Enriched.IsInEU : false
        |               ├ create evt.Enriched.IsoCode : 
        |               ├ create evt.Enriched.Latitude : 0.000000
        |               ├ create evt.Meta.IsInEU : false
        |               ├ create evt.Meta.ASNNumber : 0
        |       └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
        |               └ update evt.Whitelisted : %!s(bool=false) -> true
        |               └ update evt.WhitelistReason :  -> private ipv4/ipv6 ip/ranges
        └-------- parser failure 🔴

Turns out that private IP ranges are whitelisted by default so you can’t lock yourself out from inside. The parser crowdsecurity/whitelists has updated the property ’evt.Whitelisted’ to true and gave it a reason. That property appears to be a built-in that flags events to be dropped.

If you want to change the ranges, you can edit the logic by editing the yaml file. A sudo cscli hub list will show you what file that is. Add or remove entries from the list it’s checking the ‘ip’ value and ‘cidr’ value against. Any match cases whitelist to become true.

4.1.3 - Whitelisting

In the previous examples we’ve looked at the metrics and details of internal facing service like failed SSH logins. Those types aren’t prone to a lot of false positives. But other sources, like web access logs, can be.

False Positives

You’ll recall that when looking at metrics that a high number of ‘Lines unparsed’ is normal. They were simply entries that didn’t match any specific events the parser was looking for. Parsed lines however, are ‘poured’ to a bucket. A bucket being a potential attack type.

sudo cscli metrics

Acquisition Metrics:
╭────────────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│             Source             │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log         │ 69         │ -            │ 69             │ -                      │
│ file:/var/log/caddy/access.log │ 21         │ 21           │ -              │ 32                     │ <--Notice the high number in the 'poured' column
│ file:/var/log/syslog           │ 2          │ -            │ 2              │ -                      │
╰────────────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

In the above example, “lines poured” is bigger than the number parsed. This is because some lines can match more than one scenario and end up in multiple buckets, like a malformed user agent asking for a page that doesn’t exist. Sometimes, that’s OK. Action isn’t taken until a given bucket meets a threshold. That’s in scenarios so let’s take a look there.

Scenario Metrics:
╭──────────────────────────────────────┬───────────────┬───────────┬──────────────┬────────┬─────────╮
│                Scenario              │ Current Count │ Overflows │ Instantiated │ Poured │ Expired │
├──────────────────────────────────────┼───────────────┼───────────┼──────────────┼────────┼─────────┤
│ crowdsecurity/http-crawl-non_statics │ -             │ -         │ 2            │ 17     │ 2       │
│ crowdsecurity/http-probing           │ -             │ 1         │ 2            │ 15     │ 1       │
╰──────────────────────────────────────┴───────────────┴───────────┴──────────────┴────────┴─────────╯

It appears the scenario ‘http-crawl-non_statics’ is designed to allow some light web-crawling. Of the 32 events ‘poured’ above, 17 of them went into it’s bucket and it ‘Instantiated’ tracking against 2 IPs, but neither ‘Overflowed’, which would cause an action to be taken.

However, ‘http-probing’ did. Assuming this is related to a web application you’re trying to use, you just got blocked. So let’s see what that scenario is looking for and what we can do about it.

sudo cscli hub list | grep http-probing

  crowdsecurity/http-probing                        ✔️  enabled  0.4      /etc/crowdsec/scenarios/http-probing.yaml

sudo cat  /etc/crowdsec/scenarios/http-probing.yaml
...
...
filter: "evt.Meta.service == 'http' && evt.Meta.http_status in ['404', '403', '400']
capacity: 10
reprocess: true
leakspeed: "10s"
blackhole: 5m
...
...

You’ll notice that it’s simply looking for a few status codes, notably ‘404’. If you get more than 10 in 10 seconds, you get black-holed for 5 min. The next thing is to find out what web requests are triggering it. We could just look for 404s in the web access log, but we can also ask CrowdSec itself to tell is. This will be more important when the triggers are more subtle, so let’s give it a try now.

# Grep some 404 events from the main log to a test file
sudo grep 404 /var/log/caddy/access.log | tail  >  ~/test.log

# cscli explain with -v for more detail
sudo cscli explain -v --file ./test.log --type caddy

  ├ s00-raw
  | ├ 🟢 crowdsecurity/non-syslog (first_parser)
  | └ 🔴 crowdsecurity/syslog-logs
  ├ s01-parse
  | └ 🟢 crowdsecurity/caddy-logs (+19 ~2)
  |   └ update evt.Stage : s01-parse -> s02-enrich
  |   └ create evt.Parsed.request : /0/icon/Smith
  |   ...
  |   └ create evt.Meta.http_status : 404
  |   ...
  ├-------- parser success 🟢
  ├ Scenarios
    ├ 🟢 crowdsecurity/http-crawl-non_statics
    └ 🟢 crowdsecurity/http-probing

In this case, the client is asking for the file /0/icon/Smith and it doesn’t exist. Turns out, the web client is asking just in case and accepting the 404 without complaint in the background. That’s fine for the app, but matches two things under the Scenarios section; that of someone crawling the server, and or someone probing it. To fix this, we’ll need to create a whitelist definition for the app.

You can also work it from the alerts side and inspect what happened (assuming you’ve caused an alert).

sudo cscli alert list

# This is an actual attack, and not something to be whitelisted, but it's a good example of how the inspection works.

╭─────┬──────────────────────────┬────────────────────────────────────────────┬─────────┬──────────────────────────────────────┬───────────┬─────────────────────────────────────────╮
│  ID │           value          │                   reason                   │ country │                  as                  │ decisions │                created_at               │
├─────┼──────────────────────────┼────────────────────────────────────────────┼─────────┼──────────────────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 951 │ Ip:165.22.253.118        │ crowdsecurity/http-probing                 │ SG      │ 14061 DIGITALOCEAN-ASN               │ ban:1     │ 2025-02-26 13:53:08.589118208 +0000 UTC │


sudo cscli alerts inspect 951 -d

################################################################################################

 - ID           : 951
 - Date         : 2025-02-26T13:53:14Z
 - Machine      : 0e4a17d2f5d44270b7d543ac29c1dd4eWv2ozxHsRqoJWmRL
 - Simulation   : false
 - Remediation  : true
 - Reason       : crowdsecurity/http-probing
 - Events Count : 11
 - Scope:Value  : Ip:165.22.253.118
 - Country      : SG
 - AS           : DIGITALOCEAN-ASN
 - Begin        : 2025-02-26 13:53:08.589118208 +0000 UTC
 - End          : 2025-02-26 13:53:13.990699814 +0000 UTC
 - UUID         : eb454114-bc1e-455d-bfcc-f4772803e8bf


 - Context  :
╭────────────┬──────────────────────────────────────────────────────────────╮
│     Key    │                             Value                            │
├────────────┼──────────────────────────────────────────────────────────────┤
│ method     │ GET                                                          │
│ status     │ 403                                                          │
│ target_uri │ /                                                            │
│ target_uri │ /wp-includes/wlwmanifest.xml                                 │
│ target_uri │ /xmlrpc.php?rsd                                              │
│ target_uri │ /blog/wp-includes/wlwmanifest.xml                            │
│ target_uri │ /web/wp-includes/wlwmanifest.xml                             │
│ target_uri │ /wordpress/wp-includes/wlwmanifest.xml                       │
│ target_uri │ /website/wp-includes/wlwmanifest.xml                         │
│ target_uri │ /wp/wp-includes/wlwmanifest.xml                              │
│ target_uri │ /news/wp-includes/wlwmanifest.xml                            │
│ target_uri │ /2018/wp-includes/wlwmanifest.xml                            │
│ target_uri │ /2019/wp-includes/wlwmanifest.xml                            │
│ user_agent │ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 │
│            │ (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36       │
╰────────────┴──────────────────────────────────────────────────────────────╯

Whitelist

To whitelist an app, we create a file with an expression that matches the behavior we see above, such as the apps attempts to load a file that doesn’t exist, and exempts it. You can only add these to the s02 stage folder and the name element but be unique for each.

sudo vi /etc/crowdsec/parsers/s02-enrich/some-app-whitelist.yaml

This example uses the startsWith expression and assumes that all requests start the same

name: you/some-app
description: "Whitelist 404s for icon requests" 
whitelist: 
  reason: "icon request" 
  expression:   
    - evt.Parsed.request startsWith '/0/icon/'

If it’s less predictable, you can use a regular expression instead and combine with other expressions like a site match. In general, the more specific the better.

name: you/some-app-whitelist
description: "Whitelist 404s for icon requests" 
whitelist: 
  reason: "icon request" 
  expression:   
    - evt.Parsed.request matches '^/[0-9]/icon/.*' && evt.Meta.target_fqdn == "some-app.you.org"

Now you can reload crowdsec and test

sudo systemctl restart crowdsec.service

sudo cscli explain -v --file ./test.log --type caddy

 ├ s00-raw
 | ├ 🔴 crowdsecurity/syslog-logs
 | └ 🟢 crowdsecurity/non-syslog (+5 ~8)
 |  └ update evt.ExpectMode : %!s(int=0) -> 1
 |  └ update evt.Stage :  -> s01-parse
...
 ├ s02-enrich
 | ├ 🟢 you/some-app-whitelist (~2 [whitelisted])
 |  ├ update evt.Whitelisted : %!s(bool=false) -> true
 |  ├ update evt.WhitelistReason :  -> some icon request
 | ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~2)
...
...
 | ├ 🟢 crowdsecurity/http-logs (+7)
 | └ 🟢 crowdsecurity/whitelists (unchanged)
 └-------- parser success, ignored by whitelist (audioserve icon request) 🟢

You’ll see in the above example, we successfully parsed the entry, but it was ‘ignored’ and didn’t go on to the Scenario setion.

Regular Checking

You’ll find yourself doing this fairly regularly at first.

# Look for an IP on the ban list
sudo cscli alerts list

# Pull out the last several log entries for that IP
sudo grep SOME.IP.FROM.ALERTS /var/log/caddy/access.log | tail -10 > test.log 

# See what it was asking for
cat test.log | jq '.request'
cat test.log | jq '.request.uri'

# Ask caddy why it had a problem
sudo cscli explain -v --file ./test.log --type caddy

Troubleshooting

New Whitelist Has No Effect

If you have more than one whitelist, check the name you gave it on the first line. If that’s not unique, the whole thing will be silently ignore.

Regular Expression Isn’t Matching

CrowdSec uses the go-centric expr-lang. You may be used to unix regex where you’d escape slashes, for example. A tool like https://www.akto.io/tools/regex-tester is helpful.

4.1.4 - Custom Parser

When checking out the detailed metrics you may find that log entries aren’t being parsed. Maybe the log format has changed or you’re logging additional data the author didn’t anticipate. The best thing is to add your own parser.

Types of Parsers

There are several type of parsers and they are used in stages. Some are designed to work with the raw log entries while others are designed to take pre-parsed data and add or enrich it. This way you can do branching and not every parser needs to now how to read a syslog message.

Their Local Path will tell you what stage they kick in at. Use sudo cscli parsers list to display the details. s00-raw works with the ‘raw’ files while s01 and s02 work further down the pipeline. Currently, you can only create s00 and s01 level parsers.

Integrating with Scenarios

Useful parsers supply data that Scenarios are interested in. You can create a parser that watches the system logs for ‘FOOBAR’ entries, extracts the ‘FOOBAR-LEVEL`, and passes it on. But if nothing is looking for ‘FOOBARs’ then nothing will happen.

Let’s say you’ve added the Caddy collection. It’s pulled in a bunch of Scenarios you can view with sudo cscli scenarios list. If you look at one of the assicated files you’ll see a filter section where they look for ’evt.Meta.http_path’ and ’evt.Parsed.verb’. They are all different though, so how do you know what data to supply?

Your best bet is to take an existing parser and modify it.

Examples

Note - CrowdSec is pretty awesome and after talking in the discord they’ve already accomodated both these scenarios within a relase cycle or two. So these two examples are solved. I’m sure you’ll find new ones, though ;-)

A Web Example

Let’s say that you’ve installed the Caddy collection, but you’ve noticed basic auth login failures don’t trigger the parser. So let’s add a new file and edit it.

sudo cp /etc/crowdsec/parsers/s01-parse/caddy-logs.yaml /etc/crowdsec/parsers/s01-parse/caddy-logs-custom.yaml

You’ll notice two top level sections where the parsing happens; nodes and statics and some grok pattern matching going on.

Nodes allow you try multiple patterns and if any match, the whole section is considered successful. I.e. if the log could have either the standard HTTPDATE or a CUSTOMDATE, as long as it has one it’s good and the matching can move on. Statics just goes down the list extracting data. If any fail the whole event is considered a fail and dropped as unparseable.

All the pasrsed data gets attached to event as ’evt.Parsed.something’ and some of the statics are moving it to evt values the Senarios will be looking for Caddy logs are JSON formatted and so basically already parsed and this example makes use of the JsonExtract method quite a bit.

# We added the caddy logs in the acquis.yaml file with the label 'caddy' and so we use that as our filter here
filter: "evt.Parsed.program startsWith 'caddy'"
onsuccess: next_stage
# debug: true
name: caddy-logs-custom
description: "Parse custom caddy logs"
pattern_syntax:
 CUSTOMDATE: '%{DAY:day}, %{MONTHDAY:monthday} %{MONTH:month} %{YEAR:year} %{TIME:time} %{WORD:tz}'
nodes:
  - nodes:
    - grok:
        pattern: '%{NOTSPACE} %{NOTSPACE} %{NOTSPACE} \[%{HTTPDATE:timestamp}\]%{DATA}'
        expression: JsonExtract(evt.Line.Raw, "common_log")
        statics:
          - target: evt.StrTime
            expression: evt.Parsed.timestamp
    - grok:
        pattern: "%{CUSTOMDATE:timestamp}"
        expression: JsonExtract(evt.Line.Raw, "resp_headers.Date[0]")
        statics:
          - target: evt.StrTime
            expression: evt.Parsed.day + " " + evt.Parsed.month + " " + evt.Parsed.monthday + " " + evt.Parsed.time + ".000000" + " " + evt.Parsed.year
    - grok:
        pattern: '%{IPORHOST:remote_addr}:%{NUMBER}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_addr")
    - grok:
        pattern: '%{IPORHOST:remote_ip}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_ip")
    - grok:
        pattern: '\["%{NOTDQUOTE:http_user_agent}\"]'
        expression: JsonExtract(evt.Line.Raw, "request.headers.User-Agent")
statics:
  - meta: log_type
    value: http_access-log
  - meta: service
    value: http
  - meta: source_ip
    expression: evt.Parsed.remote_addr
  - meta: source_ip
    expression: evt.Parsed.remote_ip
  - meta: http_status
    expression: JsonExtract(evt.Line.Raw, "status")
  - meta: http_path
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - target: evt.Parsed.request #Add for http-logs enricher
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - parsed: verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_user_agent
    expression: evt.Parsed.http_user_agent
  - meta: target_fqdn
    expression: JsonExtract(evt.Line.Raw, "request.host")
  - meta: sub_type
    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

The very last line is where a status 401 is checked. It looks for a 401 and a request for Basic auth. However, this misses events where someone asks for a resource that is protected and the serer responds telling you Basic is needed. I.e. when a bot is poking at URLs on your server ignoring the prompts to login. You can look at the log entries more easily with this command to follow the log and decode it while you recreate failed attempts.

sudo tail -f /var/log/caddy/access.log | jq

To change this, update the expression to also check the response header with an additional ? (or) condition.

    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"xtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'resp_headers.Www-Authenticate[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

Syslog Example

Let’s say you’re using dropbear and failed logins are not being picked up by the ssh parser

To see what’s going on, you use the crowdsec command line interface. The shell command is cscli and you can ask it about it’s metrics to see how many lines it’s parsed and if any of them are suspicious. Since we just restarted, you may not have any syslog lines yet, so let’s add some and check.

ssh [email protected]
logger "This is an innocuous message"

cscli metrics
INFO[28-06-2022 02:41:33 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
|         SOURCE         | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 1          | -            | 1              | -                      |
+------------------------+------------+--------------+----------------+------------------------+

Notice that the line we just read is unparsed and that’s OK. That just means it wasn’t an entry the parser cared about. Let’s see if it responds to an actual failed login.

dbclient some.remote.host

# Enter some bad passwords and then exit with a Ctrl-C. Remember, localhost attempts are whitelisted so you must be remote.
[email protected]'s password:
[email protected]'s password:

cscli metrics
INFO[28-06-2022 02:49:51 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
|         SOURCE         | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 7          | -            | 7              | -                      |
+------------------------+------------+--------------+----------------+------------------------+

Well, no luck. We will need to adjust the parser

sudo cp /etc/crowdsec/parsers/s01-parse/sshd-logs.yaml /etc/crowdsec/parsers/s01-parse/sshd-logs-custom.yaml

Take a look at the logfile and copy an example line over to https://grokdebugger.com/. Use a pattern like

Bad PAM password attempt for '%{DATA:user}' from %{IP:source_ip}:%{INT:port}

Assuming you get the pattern worked out, you can then add a section to the bottom of the custom log file you created.

  - grok:
      name: "SSHD_AUTH_FAIL"
      pattern: "Login attempt for nonexistent user from %{IP:source_ip}:%{INT:port}"
      apply_on: message

4.1.5 - On Alpine

Install

There are some packages available, but (as of 2022) they are a bit behind and don’t include the config and service files. So let’s download the latest binaries from Crowsec and create our own.

Download the current release

Note: Download the static versions. Alpine uses a differnt libc than other distros.

cd /tmp
wget https://github.com/crowdsecurity/crowdsec/releases/latest/download/crowdsec-release-static.tgz
wget https://github.com/crowdsecurity/cs-firewall-bouncer/releases/latest/download/crowdsec-firewall-bouncer.tgz

tar xzf crowdsec-firewall*
tar xzf crowdsec-release*
rm *.tgz

Install Crowdsec and Register with The Central API

You cannot use the wizard as it expects systemd and doesn’t support OpenRC. Follow the Binary Install steps from CrowdSec’s binary instrcutions.

sudo apk add bash newt envsubst
cd /tmp/crowdsec-v*

# Docker mode skips configuring systemd
sudo ./wizard.sh --docker-mode

sudo cscli hub update
sudo cscli machines add -a
sudo cscli capi register

# A collection is just a bunch of parsers and scenarios bundled together for convienence
sudo cscli collections install crowdsecurity/linux

Install The Firewall Bouncer

We need a netfilter tool so install nftables. If you already have iptables installed you can skip this step and set FW_BACKEND to that below when generating the API keys.

sudo apk add nftables

Now we install the firewall bouncer. There is no static build of the firewall bouncer yet from CrowdSec, but you can get one from Alpine testing (if you don’t want to compile it yourself)

# Change from 'edge' to other versions a needed
echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
apk update
apk add cs-firewall-bouncer

Now configure the bouncer. We will once again do this manually becase there is not support for non-systemd linuxes with the install script. But cribbing from their install script, we see we can:

cd /tmp/crowdsec-firewall*

BIN_PATH_INSTALLED="/usr/local/bin/crowdsec-firewall-bouncer"
BIN_PATH="./crowdsec-firewall-bouncer"
sudo install -v -m 755 -D "${BIN_PATH}" "${BIN_PATH_INSTALLED}"

CONFIG_DIR="/etc/crowdsec/bouncers/"
sudo mkdir -p "${CONFIG_DIR}"
sudo install -m 0600 "./config/crowdsec-firewall-bouncer.yaml" "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"

Generate The API Keys

Note: If you used the APK, just do the first two lines to get the API_KEY (echo $API_KEY) and manually edit the file (vim /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml)

cd /tmp/crowdsec-firewall*
CONFIG_DIR="/etc/crowdsec/bouncers/"

SUFFIX=`tr -dc A-Za-z0-9 </dev/urandom | head -c 8`
API_KEY=`sudo cscli bouncers add cs-firewall-bouncer-${SUFFIX} -o raw`
FW_BACKEND="nftables"
API_KEY=${API_KEY} BACKEND=${FW_BACKEND} envsubst < ./config/crowdsec-firewall-bouncer.yaml | sudo install -m 0600 /dev/stdin "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"

Create RC Service Files

sudo touch /etc/init.d/crowdsec
sudo chmod +x /etc/init.d/crowdsec
sudo rc-update add crowdsec

sudo vim /etc/init.d/crowdsec

#!/sbin/openrc-run

command=/usr/local/bin/crowdsec
command_background=true

pidfile="/run/${RC_SVCNAME}.pid"

depend() {
   need localmount
   need net
}

Note: If you used the package from Alpine testing above it came with a service file. Just rc-update add cs-firewall-bouncer and skip this next step.

sudo touch /etc/init.d/cs-firewall-bouncer
sudo chmod +x /etc/init.d/cs-firewall-bouncer
sudo rc-update add cs-firewall-bouncer

sudo vim /etc/init.d/cs-firewall-bouncer

#!/sbin/openrc-run

command=/usr/local/bin/crowdsec-firewall-bouncer
command_args="-c /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml"
pidfile="/run/${RC_SVCNAME}.pid"
command_background=true

depend() {
  after firewall
}

Start The Services and Observe The Results

Start up the services and view the logs to see that everything started properly

sudo service start crowdsec
sudo service cs-firewall-bouncer status

sudo tail /var/log/crowdsec.log
sudo tail /var/log/crowdsec-firewall-bouncer.log

# The firewall bouncer should tell you about how it's inserting decisions it got from the hub

sudo cat /var/log/crowdsec-firewall-bouncer.log

time="28-06-2022 13:10:05" level=info msg="backend type : nftables"
time="28-06-2022 13:10:05" level=info msg="nftables initiated"
time="28-06-2022 13:10:05" level=info msg="Processing new and deleted decisions . . ."
time="28-06-2022 14:35:35" level=info msg="100 decisions added"
time="28-06-2022 14:35:45" level=info msg="1150 decisions added"
...
...

# If you are curious about what it's blocking
sudo nft list table crowdsec
...

4.1.6 - Cloudflare Proxy

Cloudflare offers an excellent reverse proxy and they filter most bad actors for you. But not all. Here’s a sample of what makes it through;

allen@www:~/$ sudo cscli alert list    
╭─────┬────────────────────┬───────────────────────────────────┬─────────┬────────────────────────┬───────────┬─────────────────────────────────────────╮
│  ID │        value       │               reason              │ country │           as           │ decisions │                created_at               │
├─────┼────────────────────┼───────────────────────────────────┼─────────┼────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 221 │ Ip:162.158.49.136  │ crowdsecurity/jira_cve-2021-26086 │ IE      │ 13335 CLOUDFLARENET    │ ban:1     │ 2025-01-22 15:14:34.554328601 +0000 UTC │
│ 187 │ Ip:128.199.182.152 │ crowdsecurity/jira_cve-2021-26086 │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2025-01-19 20:50:45.822199509 +0000 UTC │
│ 186 │ Ip:46.101.1.225    │ crowdsecurity/jira_cve-2021-26086 │ GB      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2025-01-19 20:50:41.699518104 +0000 UTC │
│ 181 │ Ip:162.158.108.104 │ crowdsecurity/http-bad-user-agent │ SG      │ 13335 CLOUDFLARENET    │ ban:1     │ 2025-01-19 12:39:20.468268327 +0000 UTC │
│ 180 │ Ip:172.70.208.61   │ crowdsecurity/http-bad-user-agent │ SG      │ 13335 CLOUDFLARENET    │ ban:1     │ 2025-01-19 12:38:36.664997131 +0000 UTC │
╰─────┴────────────────────┴───────────────────────────────────┴─────────┴────────────────────────┴───────────┴─────────────────────────────────────────╯

You can see that CrowdSec took action, but it was the wrong one. It’s blocking the Cloudflare exit node and removed everyone’s access.

What we want is:

Identify the actual attacker
Block that somewhere effective (the firewall-bouncer can’t selectively block proxied traffic)

Identifying The Attacker

We could replace the CrowdSec Caddy log parser and use a different header, but there’s a hint in the CrowdSec parser that suggests using the trusted_proxies module.

##Caddy now sets client_ip to the value of X-Forwarded-For if users sets trusted proxies

Additionally, we can choose the CF-Connecting-IP header like francislavoie suggests, as X-Forwarded-For is easily spoofed.

Add a Trusted Proxy

To set Cloudflare as a trusted proxy we must identify all the Cloudflare exit node IPs to trust them. That would be hard to manage, but happily, there’s a handy caddy-cloudflare-ip module for that. Many thanks to WeidiDeng!

sudo caddy add-package github.com/WeidiDeng/caddy-cloudflare-ip

sudo vi /etc/caddy/Caddyfile

#
# Global Options Block
#
{
        servers {             
                trusted_proxies cloudflare  
                client_ip_headers CF-Connecting-IP  
        }    
}

After restarting Caddy, we can see the header change

sudo head /var/log/caddy/access.log  | jq '.request'
sudo tail /var/log/caddy/access.log  | jq '.request'

Before

  "remote_ip": "172.68.15.223",
  "client_ip": "172.68.15.223",

After

  "remote_ip": "172.71.98.114",
  "client_ip": "109.206.128.45",

And when consulting crowdsec, we can see it’s using the client_ip information.

sudo tail /var/log/caddy/access.log > test.log
sudo cscli explain -v --file ./test.log --type caddy

 ├ s01-parse
 | └ 🟢 crowdsecurity/caddy-logs (+14 ~2)
 |  └ update evt.Stage : s01-parse -> s02-enrich
 |  └ create evt.Parsed.remote_ip : 109.206.128.45 <-- Your Actual IP

And when launching a probe we can see it show up with the correct IP.

# Ask for lots of pages that don't exist to simulate a HTTP probe
for X in {1..100}; do curl -D - https://www.some.org/$X;done


sudo cscli decisions list   
╭─────────┬──────────┬───────────────────┬────────────────────────────┬────────┬─────────┬───────────────┬────────┬────────────┬──────────╮
│    ID   │  Source  │    Scope:Value    │           Reason           │ Action │ Country │       AS      │ Events │ expiration │ Alert ID │
├─────────┼──────────┼───────────────────┼────────────────────────────┼────────┼─────────┼───────────────┼────────┼────────────┼──────────┤
│ 2040067 │ crowdsec │ Ip:109.206.128.45 │ crowdsecurity/http-probing │ ban    │ US      │ 600 BADNET-AS │ 11     │ 3h32m5s    │ 235      │
╰─────────┴──────────┴───────────────────┴────────────────────────────┴────────┴─────────┴───────────────┴────────┴────────────┴──────────╯

This doesn’t do anything on its own (because traffic is proxied) but we can make it work if we change bouncers.

Changing Bouncers

The ideal approach would to tell Cloudflare to stop forwarding traffic from the bad actors. There is a cloudflare-bouncer to do just that. It’s rate limited however, and only suitable for premium clients. There is also the CrowdSec Cloudflare Worker. It’s better, but still suffers from limits for non-premium clients.

Caddy Bouncer

Instead, we’ll use the caddy-crowdsec-bouncer. This is a layer 4 (protocol level) bouncer. It works inside Caddy and will block IPs based on the client_ip from the HTTP request.

Generate an API key for the bouncer with the bouncer add command - this doesn’t actually install anything, just generates a key.

sudo cscli bouncers add caddy-bouncer

Add the module to Caddy (which is the actual install).

sudo caddy add-package github.com/hslatman/caddy-crowdsec-bouncer

Configure Caddy

#
# Global Options Block
#
{
        
        crowdsec {
                api_key ABIGLONGSTRING
        }
        # Make sure to add the order statement
        order crowdsec first
}
www.some.org {

    crowdsec 

    root * /var/www/www.some.org
    file_server
}

And restart.

sudo systemctl restart caddy.service

Testing Remediation

Let’s test that probe again. Initially, you’ll get a 404 (not found) but after while of that, it should switch to 403 (access denied)

for X in {1..100}; do curl -D - --silent https://www.some.org/$X | grep HTTP;done

HTTP/2 404 
HTTP/2 404 
...
...
HTTP/2 403 
HTTP/2 403

Conclusion

Congrats! after much work you’ve traded 404s for 403s. Was it worth it? Probably. If an adversary’s probe had a chance to find something, it has less of a chance now.

Bonus Section

I mentioned earlier that the X-Forwarded-For header could be spoofed. Let’s take a look at that. Here’s an example.

# Comment out 'client_ip_headers CF-Connecting-IP' from your Caddy config, and restart.

for X in {1..100}; do curl -D - --silent "X-Forwarded-For: 192.168.0.2" https://www.some.org/$X | grep HTTP;done

HTTP/2 404 
HTTP/2 404 
...
...
HTTP/2 404 
HTTP/2 404

No remediation happens. Turns out Cloudflare appends by default, giving you:

sudo tail -f /var/log/caddy/www.some.org.log | jq

    "client_ip": "192.168.0.2",

      "X-Forwarded-For": [
        "192.168.0.2,109.206.128.45"
      ],

Caddy takes the first value, which is rather trusting but canonically correct, puts it as the client_ip and CrowdSec uses that.

Adjusting Cloudflare

You don’t need to, but you can configure Cloudflare to “Remove visitor IP headers”. This is counterintuitive, but the notes say “…Cloudflare will only keep the IP address of the last proxy”. In testing, it keeps the last value in the X-Forwarded-For string, and that’s what we’re after. It works for normal and forged headers.

Log in to the Cloudflare dashboard and select your website
Go to Rules > Overview
Select “Manage Request Header Transform Rules”
Select “Managed Transforms”
Enable Remove visitor IP headers

The Overview page may look different depending on your plan, so you may have to hunt around for this setting.

Now when you test, you’ll get access denied regardless of your header

for X in {1..100}; do curl -D - --silent "X-Forwarded-For: 192.168.0.2" https://www.some.org/$X | grep HTTP;done

HTTP/2 404 
HTTP/2 404 
...
...
HTTP/2 403 
HTTP/2 403

Bonus Ending

You’ve added an extra layer of protection - but it’s not clear if it’s worth it. It may add to the proxy time, so use at your own discretion.

4.2 - Encryption

4.2.1 - GPG

GPG is an implementation of the OpenPGP standard (the term ‘PGP’ is trademarked by Symantec).

The best practice, that GPG implements by default, is to create a signing-only primary key with an encryption subkey¹. These subkeys expire² and must be extended or replaced from time to time.

The Basics

The basics of gpg can be broken down into:

managing your keys
encrypting and decrypting your files
integrating gpg keys with mail and other utilities

Let’s skip the details of asymmetric key encryption, public private keys, and just know that there are two keys; your private key, and your public key. You encrypt with the public key, and you decrypt with the private key.

The private key is the one that matters. That’s the one you use to decrypt things. Your public key you can recreate, should you lose it, as long as you have your private key.

The public key is the one you pass out to your friends and even put on your web site when you want someone to sen you something that only you can read. It sounds crazy, but through wonders of mathematics, it can only be used to encrypt a file, never to decrypt one. So it doesn’t matter who you give it to. They can encrypt something, send it to you, and you can decrypt it - all without anyone sending a password.

You can also sign things. This is when you want to send something that anyone can read, but just want to be sure it came from you. More on that later. Let’s focus on secrecy.

Note - In my opinion, we can probably skip all the old command line stuff, not that it’s not good to know, it’s just slower to use as a novice.

http://ubuntuforums.org/showthread.php?t=680292

Key Management

To list keys

# If you don't use this list-option arguement, you won't see all the subkeys
gpg --list-options show-unusable-subkeys --list-keys

gpg --edit-key C621C2A8040C51F5C4AD9D2990A1676C9CB79C5D addkey

Encrypt and Decrypt

This will encrypt the file and apply the default option of appending .gpg on the end of the file

gpg -e -r '[email protected]' /path/to/some/file.txt

This will do the reverse - note you have to specify the output file or you will get to view the decrypted file via stdout, probably not what you wanted

gpg -o /path/to/some/file.txt -d /path/to/some/file.txt.gpg

4.3 - Event Management

Before it was SIEM

Back in the dawn of time, we called it ‘Central Logging’ and it looked kind of like this:

# The classical way you'd implement this is via a tiered system.

Log Shipper --\                   /--> Log Parser --\
Log Shipper ---+--> Log Broker --+---> Log Parser ---+--> Log Storage --> Log Visualizer 
Log Shipper --/                   \--> Log Parser --/

# The modern way is more distributed. The clients are more powerful so you spread the load out and they can connect to distributed storage directly.

Log Parser Shipper --\ /-- Log Storage <-\
Log Parser Shipper ---+--- Log Storage <--+-  Visualizer 
Log Parser Shipper --/ \-- Log Storage <-/

# ELK (Elasticsearch Logstash and Kibana) is a good example.

Logstash --\ /-- Elasticsearch <-\
Logstash ---+--- Elasticsearch <--+--> Kibana 
Logstash --/ \-- Elasticsearch <-/

More recently, there’s a move toward shippers like NXLog and Elasticsearch’s beats client. A native client saves you from deploying Java and is better suited for thin or micro instances.

# NXLog has an output module for Elasticsearch now. Beats is an Elasticsearch product.
nxlog --\   
nxlog ---+--> Elasticsearch <-- Kibana
beats --/

Windows has it’s own log forwarding technology. You can put it to work without installing anything on the clients. This makes Windows admins a lot happier.

# It's built-in and fine for windows events - just doesn't do text files. Beats can read the events and push to elasticsearch.
Windows Event Forwarding --\   
Windows Event Forwarding ---+--> Central Windows Event Manager -> Beats/Elasticsearch --> Kibana
Windows Event Forwarding --/

Unix has several ways to do it, but the most modern/least-overhead way is to use the native journald system.

# Built-in to systemd
journald send --> central journald receive --> Beats/Elasticsearch --> Kibana

But Why?

The original answer used to be ‘reporting’. It was easier to get all the data together and do an analysis in one place.

Now the answer is ‘correlation’. If someone is probing your systems, they’ll do it very slowly and from multiple IPs to evade thresholds if they can, trying to break up patterns of attack. These patterns can become clear however, when you have a complete picture in one place.

4.3.1 - Elastic Stack

This is also referred to ELK, and is an acronym that stands for Elasticsearch, Logstash and Kibana

This is a trio of tools that <www.elasticsearch.org> has packaged up into a simple and flexible way to handle, store and visualize data. Logstash collects the logs, parses them and stores them in Elasticsearch. Kibana is a web application that knows how to to talk to Elasticsearch and visualizes the data.

Quite simple and powerful

To make use of this tio, start by deploying in this order:

Elasticseach (first, you have have some place to put things)
Kibana (so you can see what’s going on in elasticsearch easily)
Logstash (to start collecting data)

More recently, you can use the Elasticsearch Beats client in place of Logstash. These are natively compiled clients that have less capability, but are easier on the infrastructure than Logstash, a Java application.

4.3.1.1 - Elasticsearch

4.3.1.1.1 - Installation (Linux)

This is circa 2014 - use with a grain of salt.

This is generally the first step, as you need a place to collect your logs. Elasticsearch itself is a NoSQL database and well suited for pure-web style integrations.

Java is required, and you may wish to deploy Oracle’s java per the Elasticsearch team’s recommendation. You may also want to dedicate a data partition. By default, data is stored in /var/lib/elasticsearch and that can fill up. We will also install the ‘kopf’ plugin that makes it easier to manage your data.

Install Java and Elasticsearch

# (add a java repo)
sudo yum install java

# (add the elasticsearch repo)
sudo yum install elasticsearch

# Change the storage location
sudo mkdir /opt/elasticsearch
sudo chown elasticsearch:elasticsearch /opt/elasticsearch

sudo vim /etc/elasticsearch/elasticsearch.yml

    ...
    path.data: /opt/elasticsearch/data
    ...

# Allow connections on ports 9200, 9300-9400 and set the cluster IP

# By design, Elasticsearch is open so control access with care
sudo iptables --insert INPUT --protocol tcp --source 10.18.0.0/16 --dport 9200 --jump ACCEPT

sudo iptables --insert INPUT --protocol tcp --source 10.18.0.0/16 --dport 9300:9300 --jump ACCEPT

sudo vim /etc/elasticsearch/elasticsearch.yml
    ...
    # Failing to set the 'publish_host can result in the cluster auto-detecting an interface clients or other
    # nodes can't reach. If you only have one interface you can leave commented out. 
    network.publish_host: 10.18.3.1
    ...


# Increase the heap size
sudo vim  /etc/sysconfig/elasticsearch

    # Heap size defaults to 256m min, 1g max
    # Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=2g

# Install the kopf plugin and access it via your browser

sudo /usr/share/elasticsearch/bin/plugin -install lmenezes/elasticsearch-kopf
sudo service elasticsearch restart

In your browser, navigate to

http://10.18.3.1:9200/_plugin/kopf/

If everything is working correctly you should see a web page with KOPF at the top.

4.3.1.1.2 - Installation (Windows)

You may need to install on windows to ensure the ‘maximum amount of service ability with existing support staff’. I’ve used it on both Windows and Linux and it’s fine either way. Windows just requires a few more steps.

Requirements and Versions

The current version of Elasticsearch at time of writing these notes is 7.6. It requires an OS and Java. The latest of those supported are:

Windows Server 2016
OpenJDK 13

Installation

The installation instructions are at https://www.elastic.co/guide/en/elastic-stack-get-started/current/get-started-elastic-stack.html

Note: Elasicsearch has both an zip and a MSI. The former comes with a java distro but the MSI includes a service installer.

Java

The OpenJDK 13 GA Releases at https://jdk.java.net/13/ no longer include installers or the JRE. But you can install via a MSI from https://github.com/ojdkbuild/ojdkbuild

Download the latest java-13-openjdk-jre-13.X and execute. Use the advanced settings to include the configuration of the JAVA_HOME and other useful variables.

To test the install, open a command prompt and check the version

C:\Users\allen>java --version
openjdk 13.0.2 2020-01-14
OpenJDK Runtime Environment 19.9 (build 13.0.2+8)
OpenJDK 64-Bit Server VM 19.9 (build 13.0.2+8, mixed mode, sharing)

Elasticsearch

Download the MSI installer from https://www.elastic.co/downloads/elasticsearch. It may be tagged as beta, but it installs the GA product well. Importantly, it also installs a windows service for Elasticsearch.

Verify the installation by checking your services for ‘Elasticsearch’, which should be running.

Troubleshooting

Elasticsearch only listing on localhhost

By default, this is the case. You must edit the config file.

# In an elevated command prompt
notepad C:\ProgramDaata\Elastic\Elasticsearach\config\elasticsearch.yml

# add
discovery.type: single-node
network.host: 0.0.0.0

https://stackoverflow.com/questions/59350069/elasticsearch-start-up-error-the-default-discovery-settings-are-unsuitable-for

failure while checking if template exists: 405 Method Not Allowed

You can’t run newer versions of the filebeat with older versions of elasticsearch. Download the old deb and sudo apt install ./some.deb

https://discuss.elastic.co/t/filebeat-receives-http-405-from-elasticsearch-after-7-x-8-1-upgrade/303821 https://discuss.elastic.co/t/cant-start-filebeat/181050

4.3.1.1.3 - Common Tasks

This is circa 2014 - use with a grain of salt.

Configuration of elasticsearch itself is seldom needed. You will have to maintain the data in your indexes however. This is done by either using the kopf tool, or at the command line.

After you have some data in elasticsearch, you’ll see that your ‘documents’ are organized into ‘indexes’. This is a simply a container for your data that was specified when logstash originally sent it, and the naming is arbitrarily defined by the client.

Deleting Data

The first thing you’re likely to need is to delete some badly-parsed data from your testing.

Delete all indexes with the name test*

curl -XDELETE http://localhost:9200/test*

Delete from all indexes documents of type ‘WindowsEvent’

curl -XDELETE http://localhost:9200/_all/WindowsEvent

Delete from all indexes documents have the attribute ‘path’ equal to ‘/var/log/httpd/ssl_request.log’

curl -XDELETE 'http://localhost:9200/_all/_query?q=path:/var/log/https/ssl_request.log'

Delete from the index ’logstash-2014.10.29’ documents of type ‘shib-access’

curl -XDELETE http://localhost:9200/logstash-2014.10.29/shib-access

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Curator

All the maintenance by hand has to stop at some point and Curator is a good tool to automate some of it. This is a script that will do some curls for you, so to speak.

Install

wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
sudo pip install elasticsearch-curator
sudo pip install argparse

Use

curator --help
curator delete --help

And in your crontab

# Note: you must escape % characters with a \ in crontabs
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-bb-.*'
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-adfsv2-.*'
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-20.*'

Sometimes you’ll need to do an inverse match.

0 20 * * * curator delete indices --regex '^((?!logstash).)*$'

A good way to test your regex is by using the show indices method

curator show indices --regex '^((?!logstash).)*$'

Here’s some OLD posts and links, but be aware the syntax had changed and it’s been several versions since these

http://www.ragingcomputer.com/2014/02/removing-old-records-for-logstash-elasticsearch-kibana http://www.elasticsearch.org/blog/curator-tending-your-time-series-indices/ http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word

Replication and Yellow Cluster Status

By default, elasticsearch assumes you want to have two nodes and replicate your data and the default for new indexes is to have 1 replica. You may not want to do that to start with however, so you change the default and change the replica settings on your existing data in-bulk with:

http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices

Set all existing replica requirements to just one copy

curl -XPUT 'localhost:9200/_settings' -d '
{ 
  "index" : { "number_of_replicas" : 0 } 
}'

Change the default settings for new indexes to have just one copy

curl -XPUT 'localhost:9200/_template/logstash_template' -d ' 
{ 
  "template" : "*", 
  "settings" : {"number_of_replicas" : 0 }
} '

http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices

Unassigned Shards

You will occasionally have a hiccup where you run out of disk space or something similar and be left with indexes that have no data in them or have shards unassigned. Generally, you will have to delete them but you can also manually reassign them.

http://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix

Listing Index Info

You can get a decent human readable list of your indexes using the cat api

curl localhost:9200/_cat/indices

If you wanted to list by size, they use the example

curl localhost:9200/_cat/indices?bytes=b | sort -rnk8

4.3.1.2 - Kibana

4.3.1.2.1 - Installation (Windows)

Kibana is a Node.js app using the Express Web framework - meaning to us it looks like a web server running on port 5601. If you’re running elasticsearch on the same box, it will connect with the defaults.

https://www.elastic.co/guide/en/kibana/current/windows.html

Download and Extract

No MSI or installer is available for windows so you must download the .zip from https://www.elastic.co/downloads/kibana. Uncompress (this will take a while), rename it to ‘Kibana’ and move it to Program Files.

So that you may access it later, edit the config file at {location}/config/kibana.yaml with wordpad and set the server.host entry to:

server.host: "0.0.0.0"

Create a Service

Download the service manager NSSM from https://nssm.cc/download and extract. Start an admin powershell, navigate to the extracted location and run the installation command like so:

C:\Users\alleng\Downloads\nssm-2.24\nssm-2.24\win64> .\nssm.exe install Kibana

In the Pop-Up, set the application path to the below. The start up path will auto populate.

C:\Program Files\Kibana\kibana-7.6.2-windows-x86_64\bin\kibana.bat

Click ‘Install service’ and it should indicate success. Go to the service manager to find and start it. After a minute (Check process manager for the CPU to drop) You should be able to access it at:

http://localhost:5601/app/kibana#/home

4.3.1.2.2 - Troubleshooting

Rounding Errors

Kibana rounds to 16 significant digits

Turns out, if you have a value of type integer, that’s just the limit. While elasticsearch shows you this:

    curl http://localhost:9200/logstash-db-2016/isim-process/8163783564660983218?pretty
    {
      "_index" : "logstash-db-2016",
      "_type" : "isim-process",
      "_id" : "8163783564660983218",
      "_version" : 1,
      "found" : true,
      "_source":{"requester_name":"8163783564660983218","request_num":8163783618037078861,"started":"2016-04-07 15:16:16:139 GMT","completed":"2016-04-07 15:16:16:282 GMT","subject_service":"Service","request_type":"EP","result_summary":"AA","requestee_name":"Mr. Requester","subject":"mrRequest","@version":"1","@timestamp":"2016-04-07T15:16:16.282Z"}
    }

Kibana shows you this

View: Table / JSON / Raw
Field Action Value
request_num    8163783618037079000

Looking at the JSON will give you the clue - it’s being treated as an integer and not a string.

 "_source": {
    "requester_name": "8163783564660983218",
    "request_num": 8163783618037079000,
    "started": "2016-04-07 15:16:16:139 GMT",
    "completed": "2016-04-07 15:16:16:282 GMT",

Mutate it to string in logstash to get your precision back.

https://github.com/elastic/kibana/issues/4356

4.3.1.3 - Logstash

Logstash is a parser and shipper. It reads from (usually) a file, parses the data into JSON, then connects to something else and send the data. That something else can be Elasticsearch, a systlog server, and others.

Logstash v/s Beats

But for most things these days, Beats is a better choice. Give that a look fist.

4.3.1.3.1 - Installation

Note: Before you install logstash, take a look at Elasticsearch’s Beats. It’s lighter-weight for most tasks.

Quick Install

This is a summary of the current install page. Visit and adjust versions as needed.

# Install java
apt install default-jre-headless
apt-get install apt-transport-https
apt install gnupg2
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -

# Check for the current version - 7 is no longer the current version by now
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update
apt-get install logstash

Logstash has a NetFlow module, but it has been deprecated2. One should instead use the Filebeat Netflow Module.3

The rest of this page is circa 2014 - use with a grain of salt.

Installation - Linux Clients

Install Java

If you don’t already have it, install it. You’ll need at least 1.7 and Oracle is recommended. However, with older systems do yourself a favor and use the OpenJDK as older versions of Sun and IBM do things with cryptography leading to strange bugs in recent releases of logstash.

# On RedHat flavors, install the OpenJDK and select it for use (in case there are others) with the system alternatives utility
sudo yum install java-1.7.0-openjdk

sudo /usr/sbin/alternatives --config java

Install Logstash

This is essentially:

( Look at https://www.elastic.co/downloads/logstash to get the lastest version or add the repo)
wget (some link from the above page)
sudo yum --nogpgcheck localinstall logstash*

# You may want to grab a plugin, like the syslog output, though elasticsearch installs by default
cd /opt/logstash/
sudo bin/plugin install logstash-output-syslog

# If you're ready to configure the service
sudo vim /etc/logstash/conf.d/logstash.conf

sudo service logstash start

https://www.elastic.co/guide/en/logstash/current/index.html

Operating

Input

The most common use of logstash is to tail and parse log files. You do this by specifying a file and filter like so

[gattis@someHost ~]$ vim /etc/logstash/conf.d/logstash.conf


input {
  file {
    path => "/var/log/httpd/request.log"
  }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

Filter

There are many different types of filters, but the main one you’ll be using is grok. It’s all about parsing the message into fields. Without this, you just have a bunch of un-indexed text in your database. It ships with some handy macros such as %{COMBINEDAPACHELOG} that takes this:

10.138.120.138 - schmoej [01/Apr/2016:09:39:04 -0400] "GET /some/url.do?action=start HTTP/1.1" 200 10680 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"

And turns it into

agent        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"
auth         schmoej
bytes                   10680
clientip   10.138.120.138
httpversion 1.1
path           /var/pdweb/www-default/log/request.log
referrer   "-"
request   /some/url.do?action=start
response   200
timestamp   01/Apr/2016:09:39:04 -0400
verb        GET

See the grok’ing for more details

Output

We’re outputting to the console so we can see what’s going on with our config. If you get some output, but it’s not parsed fully because of an error in the parsing, you’ll see something like the below with a “_grokparsefailure” tag. That means you have to dig into a custom pattern as in described in grok’ing.

Note: by default, logstash is ’tailing’ your logs, so you’ll only see new entries. If you’ve got no traffic you’ll have to generate some

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

If it looks good, you’ll want to send it on to your database. Change your output to look like so which will put your data in a default index that kibana (the visualizer) can show by default.

output {

  elasticsearch {
    hosts => ["10.17.153.1:9200"]
  }
}

Troubleshooting

If you don’t get any output at all, check that the logstash user can actually read the file in question. Check your log files and try running logstash as yourself with the output going to the console.

cat /var/log/logstash/*

/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

4.3.1.3.2 - Operation

Basic Operation

Generally, you create a config with 3 sections;

input
filter
output

This example uses the grok filter to parse the message.

sudo vi /etc/logstash/conf.d/logstash.conf

input {
  file {
        path => "/var/pdweb/www-default/log/request.log"        
      }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  stdout { }
}

Then you test it at the command line

# Test the config file itself
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf --configtest

# Test the parsing of data
/opt/logstash/bin/logstash -e -f /etc/logstash/conf.d/logstash.conf

You should get some nicely parsed lines. If that’s the case, you can edit your config to add a sincedb and an actual destination.

input {
  file {
        path => "/var/pdweb/www-default/log/request.log"
        sincedb_path => "/opt/logstash/sincedb"
  }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  elasticsearch {
    host => "some.server.private"
    protocol => "http"
  }
}

If instead you see output with a _grokparsefailure like below, you need to change the filter. Take a look at the common gotchas, then the parse failure section below it.

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Common Gotchas

No New Data

Logstash reads new lines by default. If you don’t have anyone actually hitting your webserver, but you do have some log entries in the file itself, you can tell logstash to process the exiting entries and not save it’s place in the file.

file {
  path => "/var/log/httpd/request.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
}

Multiple Conf files

Logstash uses all the files in the conf.d directory - even if they don’t end in .conf. Make sure to remove any you don’t want as they can conflict.

Default Index

Logstash creates Elasticsearch indexes that look like:

logstash-%{+YYYY.MM.dd}

The logstash folks have some great material on how to get started. Really top notch.

http://logstash.net/docs/1.4.2/configuration#fieldreferences

Parse Failures

The Greedy Method

The best way to start is to change your match to a simple pattern and work out from there. Try the ‘GREEDYDATA’ pattern and assign it to a field named ‘Test’. This takes the form of:

%{GREEDYDATA:Test}

And it looks like:

filter {
  grok {
    match => [ "message" => "%{GREEDYDATA:Test}" ]
  }
}


       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "Test" => "The rest of your message

That should give you some output. You can then start cutting it up with the patterns (also called macros) found here;

https://github.com/elasticsearch/logstash/blob/v1.4.2/patterns/grok-patterns (adjust for version)

You can also use the online grok debugger and the list of default patterns.

http://grokdebug.herokuapp.com

Combining Patterns

There may not be a standard pattern for what you want, but it’s easy to pull together several existing ones. Here’s an example that pulls in a custom timestamp.

Example:
Sun Oct 26 22:20:55 2014 File does not exist: /var/www/html/favicon.ico

Pattern:
match => { "message" => "(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})"}

Notice the ‘?’ at the beginning of the parenthetical enclosure. That tells the pattern matching engine not to bother capturing that for later use. Like opting out of a ( ) and \1 in sed.

Optional Fields

Some log formats simply skip columns when they don’t have data. This will cause your parse to fail unless you make some fields optional with a ‘?’, like this:

match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]

Date Formats

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Dropping Events

Oftentimes, you’ll have messages that you don’t care about and you’ll want to drop those. Best practice is to do coarse actions first, so you’ll want to compare and drop with a general conditional like:

filter {
  if [message] =~ /File does not exist/ {
    drop { }
  }
  grok {
    ...
    ...

You can also directly reference fields once you have grok’d the message

filter {
  grok {
    match => { "message" => "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"}
  }  
  if [request] == "/status" {
        drop { }
  }
}

http://logstash.net/docs/1.4.2/configuration#conditionals

Dating Messages

By default, logstash date stamps the message when it sees them. However, there can be a delay between when an action happens and when it gets logged to a file. To remedy this - and allow you to suck in old files without the date on every event being the same - you add a date filter.

Note - you actually have to grok out the date into it’s own variable, you can’t just attempt to match on the whole message. The combined apache macro below does this for us.

filter { grok { match => { “message” => “%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?”} } date { match => [ “timestamp” , “dd/MMM/yyyy:HH:mm:ss Z” ] } }

In the above case, ’timestamp’ is a parsed field and you’re using the date language to tell it what the component parts are

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Sending to Multiple Servers

In addition to an elasticsearch server, you may want to send it to a syslog server at the same time.

    input {
      file {
        path => "/var/pdweb/www-default/log/request.log"
        sincedb_path => "/opt/logstash/sincedb"
      }
    }

    filter {
      grok {
        match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]
      }
      date {
        match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
      }

    }

    output {
      elasticsearch {
        host => "some.server.private"
        protocol => "http"
      }
      syslog {
        host => "some.syslog.server"
        port => "514"
        severity => "notice"
        facility => "daemon"
      }
    }

Deleting Sent Events

Sometimes you’ll accidentally send a bunch of event to the server and need to delete and resend corrected versions.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-mapping.html

curl -XDELETE <http://localhost:9200/_all/SOMEINDEX>
curl -XDELETE <http://localhost:9200/_all/SOMEINDEX?q=path:"/var/log/httpd/ssl_request_log>"

4.3.1.3.3 - Index Routing

When using logstash as a broker, you will want to route events to different indexes according to their type. You have two basic ways to do this;

Using Mutates with a single output
Using multiple Outputs

The latter is significantly better for performance. The less you touch the event, the better it seems. When testing these two different configs in the lab, the multiple output method was about 40% faster when under CPU constraint. (i.e. you can always add more CPU if you want to mutate the events.)

Multiple Outputs

    input {
      ...
      ...
    }
    filter {
      ...
      ...
    }
    output {

      if [type] == "RADIUS" {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-radius-%{+YYYY.MM.dd}"
        }
      }

      else if [type] == "RADIUSAccounting" {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-radius-accounting-%{+YYYY.MM.dd}"
        }
      }

      else {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-test-%{+YYYY.MM.dd}"
        }
      }

    }

Mutates

If your source system includes a field to tell you want index to place it in, you might be able to skip mutating altogether, but often you must look at the contents to make that determination. Doing so does reduce performance.

input {
  ...
  ...
}
filter {
  ...
  ... 

  # Add a metadata field with the destination index based on the type of event this was
  if [type] == "RADIUS" {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-radius" } } 
  }
  else  if [type] == "RADIUSAccounting" {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-radius-accounting" } } 
  }
  else {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-test" } } 
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "%{[@metadata][index-name]}-%{+YYYY.MM.dd}"
  }
}

https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#metadata

4.3.1.3.4 - Database Connections

You can connect Logstash to a database to poll events almost as easily as tailing a log file.

Installation

The JDBC plug-in ships with logstash so no installation of that is needed. However, you do need the JDBC driver for the DB in question.

Here’s an example for DB2, for which you can get the jar from either the server itself or the DB2 fix-pack associated with the DB Version you’re running. The elasticsearch docs say to just put it in your path. I’ve put it in the logstash folder (based on some old examples) and we’ll see if it survives upgrades.

sudo mkdir /opt/logstash/vendor/jars
sudo cp /home/gattis/db2jcc4.jar /opt/logstash/vendor/jars
sudo chown -R logstash:logstash /opt/logstash/vendor/jars

Configuration

Configuring the input

Edit the config file like so

sudo vim /etc/logstash/conf.d/logstash.conf

    input {
      jdbc {
        jdbc_driver_library => "/opt/logstash/vendor/jars/db2jcc4.jar"
        jdbc_driver_class => "com.ibm.db2.jcc.DB2Driver"
        jdbc_connection_string => "jdbc:db2://db1.tim.private:50000/itimdb"
        jdbc_user => "itimuser"
        jdbc_password => "somePassword"
        statement => "select * from someTable"
      }
    }

Filtering

You don’t need to do any pattern matching, as the input emits the event pre-parsed based on the DB columns. You may however, want to match a timestamp in the database.

    # A sample value in the 'completed' column is 2016-04-07 00:41:03:291 GMT

    filter {
      date {
        match => [ "completed" , "yyyy-MM-dd HH:mm:ss:SSS zzz" ]
      }
    }

Output

One recommended trick is to link the primary keys between the database and kibana. That way, if you run the query again you update the existing elasticsearch records rather than create duplicates ones. Simply tell the output plugin to use the existing primary key from the database for the document_id when it sends it to elasticsearch.

    # Database key is the column 'id'

    output {
      elasticsearch {
        hosts => ["10.17.153.1:9200"]
        index => "logstash-db-%{+YYYY}"

        document_id => "${id}"

        type => "isim-process"

      }
    }

Other Notes

If any of your columns are non-string type, logstash and elasticsearch will happily store them as such. But be warned that kibana will round them to 16 digits due to a limitation of javascript.

https://github.com/elastic/kibana/issues/4356

Sources

https://www.elastic.co/blog/logstash-jdbc-input-plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

4.3.1.3.5 - Multiline Matching

Here’s an example that uses the multiline codec (preferred over the multiline filter, as it’s more appropriate when you might have more than one input)

input {
  file {
    path => "/opt/IBM/tivoli/common/CTGIM/logs/access.log"
    type => "itim-access"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "^<Message Id"
      negate => true
      what => previous
    }
  }
}

Getting a match can be difficult, as grok by default does not match against multiple lines. You can mutate to remove all the new lines, or use a seemingly secret preface, the ‘(?m)’ directive as shown below

filter {
  grok {
    match => { "message" => "(?m)(?<timestamp>%{YEAR}.%{MONTHNUM}.%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE})%{DATA}com.ibm.itim.security.%{WORD:catagory}%{DATA}CDATA\[%{DATA:auth}\]%{DATA}CDATA\[%{DATA:clientip}\]"}
  }

https://logstash.jira.com/browse/LOGSTASH-509

4.3.1.4 - Beats

Beats are a family of lightweight shippers that you should consider as a first-solution for sending data to Elasticsearch. The two most common ones to use are:

Filebeat
Winlogbeat

Filebeat is used both for files, and for other general types, like syslog and NetFlow data.

Winlogbeat is used to load Windows events into Elasticsearch and works well with Windows Event Forwarding.

4.3.1.4.1 - Linux Installation

On Linux

A summary from the general docs. View and adjust versions as needed.

If you haven’t already added the repo:

apt-get install apt-transport-https
apt install gnupg2
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update

apt install filebeat
systemctl enable filebeat

Filebeat uses a default config file at /etc/filebeat/filebeat.yml. If you don’t want to edit that, you can use the ‘modules’ to configure it for you. That command will also load dashboard elements into Kibana, so you must have that already installed Kibana to make use of it.

Here’s a simple test

mv /etc/filebeat/filebeat.yml /etc/filebeat/filebeat.yml.orig
vi /etc/filebeat/filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
output.file:
  path: "/tmp/filebeat"
  filename: filebeat
  #rotate_every_kb: 10000
  #number_of_files: 7
  #permissions: 0600

4.3.1.4.2 - Windows Installation

Installation

Download the .zip version (the msi doesn’t include the server install script) from the URL below. Extract, rename to Filebeat and move it the to the c:\Program Files directory.

https://www.elastic.co/downloads/beats/filebeat

Start an admin powershell, change to that directory and run the service install command. (Keep the shell up for later when done)

PowerShell.exe -ExecutionPolicy UnRestricted -File .\install-service-filebeat.ps1

Basic Configuration

Edit the filebeat config file.

write.exe filebeat.yml

You need to configure the input and output sections. The output is already set to elasticsearch localhost so you only have to change the input from the unix to the windows style.

  paths:
    #- /var/log/*.log
    - c:\programdata\elasticsearch\logs\*

Test as per normal

  ./filebeat test config -e

Filebeat specific dashboards must be added to Kibana. Do that with the setup argument:

  .\filebeat.exe setup --dashboards

To start Filebeat in the forrgound (to see any interesting messages)

  .\filebeat.exe -e

If you’re happy with the results, you can stop the application then start the service

  Ctrl-C
  Start-Service filebeat

Adapted from the guide at

https://www.elastic.co/guide/en/beats/filebeat/7.6/filebeat-getting-started.html

4.3.1.4.3 - NetFlow Forwarding

The NetFlow protocol is now implemented in Filebeat¹. Assuming you’ve installed Filebeat and configured Elasticsearch and Kibana, you can use this input module to auto configure the inputs, indexes and dashboards.

./filebeat modules enable netflow
filebeat setup -e

If you are just testing and don’t want to add the full stack, you can set up the netflow input² which the module is a wrapper for.

filebeat.inputs:
- type: netflow
  max_message_size: 10KiB
  host: "0.0.0.0:2055"
  protocols: [ v5, v9, ipfix ]
  expiration_timeout: 30m
  queue_size: 8192
output.file:
  path: "/tmp/filebeat"
  filename: filebeat

filebeat test config -e

Consider dropping all the fields you don’t care about as there are a lot of them. Use the include_fields processor to limit what you take in

  - include_fields:
      fields: ["destination.port", "destination.ip", "source.port", "source.mac", "source.ip"]

4.3.1.4.4 - Palo Example

# This filebeat config accepts TRAFFIC and SYSTEM syslog messages from a Palo Alto, 
# tags and parses them 

# This is an arbitrary port. The normal port for syslog is UDP 512
filebeat.inputs:
  - type: syslog
    protocol.udp:
      host: ":9000"

processors:
    # The message field will have "TRAFFIC" for  netflow logs and we can 
    # extract the details with a CSV decoder and array extractor
  - if:
      contains:
        message: ",TRAFFIC,"
    then:
      - add_tags:
          tags: "netflow"
      - decode_csv_fields:
          fields:
            message: csv
      - extract_array:
          field: csv
          overwrite_keys: true
          omit_empty: true
          fail_on_error: false
          mappings:
            source.ip: 7
            destination.ip: 8
            source.nat.ip: 9
            network.application: 14
            source.port: 24
            destination.port: 25
            source.nat.port: 26
      - drop_fields:
          fields: ["csv", "message"] 
    else:
        # The message field will have "SYSTEM,dhcp" for dhcp logs and we can 
        # do a similar process to above
      - if:
          contains:
            message: ",SYSTEM,dhcp"
        then:
        - add_tags:
            tags: "dhcp"
        - decode_csv_fields:
            fields:
              message: csv
        - extract_array:
            field: csv
            overwrite_keys: true
            omit_empty: true
            fail_on_error: false
            mappings:
              message: 14
        # The DHCP info can be further pulled apart using space as a delimiter
        - decode_csv_fields:
            fields:
              message: csv2
            separator: " "
        - extract_array:
            field: csv2
            overwrite_keys: true
            omit_empty: true
            fail_on_error: false
            mappings:
              source.ip: 4
              source.mac: 7
              hostname: 10
        - drop_fields:
            fields: ["csv","csv2"] # Can drop message too like above when we have watched a few        
  - drop_fields:
      fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "ecs.version","host.name","event.severity","input.type","hostname","log.source.address","syslog.facility", "syslog.facility_label", "syslog.priority", "syslog.priority_label","syslog.severity_label"]
      ignore_missing: true
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 1
output.elasticsearch:
  hosts: ["localhost:9200"]

4.3.1.4.5 - RADIUS Forwarding

Here’s an example of sending FreeRADIUS logs to Elasticsearch.

cat /etc/filebeat/filebeat.yml

filebeat.inputs:
  - type: log
    paths:
      - /var/log/freeradius/radius.log
    include_lines: ['\) Login OK','incorrect']
    tags: ["radius"]
processors:
  - drop_event:
      when:
        contains:
          message: "previously"
  - if:
      contains:
        message: "Login OK"
    then: 
      - dissect:
          tokenizer: "%{key1} [%{source.user.id}/%{key3}cli %{source.mac})"
          target_prefix: ""
      - drop_fields:
          fields: ["key1","key3"]
      - script:
          lang: javascript
          source: >
            function process(event) {
                var mac = event.Get("source.mac");
                if(mac != null) {
                        mac = mac.toLowerCase();
                         mac = mac.replace(/-/g,":");
                         event.Put("source.mac", mac);
                }
              }            
    else:
      - dissect:
          tokenizer: "%{key1} [%{source.user.id}/<via %{key3}"
          target_prefix: ""
      - drop_fields: 
          fields: ["key1","key3"]
output.elasticsearch:
  hosts: ["http://logcollector.yourorg.local:9200"]        
  allow_older_versions: true
  setup.ilm.enabled: false

4.3.1.4.6 - Syslog Forwarding

You may have an older system or appliance that can transmit syslog data. You can use filebeat to accept that data and store it in Elasticsearch.

Add Syslog Input

Install filebeat and test the reception the /tmp.

vi  /etc/filebeat/filebeat.yml

filebeat.inputs:
- type: syslog
  protocol.udp:
    host: ":9000"
output.file:
  path: "/tmp"
  filename: filebeat


sudo systemctl filebeat restart

pfSense Example

The instructions are NetGate’s remote logging example.

Status -> System Logs -> Settings

Enable and configure. Internet rumor has it that it’s UDP only so the config above reflects that. Interpreting the output requires parsing the message section detailed in the filter log format docs.

'5,,,1000000103,bge1.1099,match,block,in,4,0x0,,64,0,0,DF,17,udp,338,10.99.147.15,255.255.255.255,2048,30003,318'

'5,,,1000000103,bge2,match,block,in,4,0x0,,84,1,0,DF,17,udp,77,157.240.18.15,205.133.125.165,443,61343,57'

'222,,,1000029965,bge2,match,pass,out,4,0x0,,128,27169,0,DF,6,tcp,52,205.133.125.142,205.133.125.106,5225,445,0,S,1248570004,,8192,,mss;nop;wscale;nop;nop;sackOK'

'222,,,1000029965,bge2,match,pass,out,4,0x0,,128,11613,0,DF,6,tcp,52,205.133.125.142,211.24.111.75,15305,445,0,S,2205942835,,8192,,mss;nop;wscale;nop;nop;sackOK'

4.3.2 - Loki

Loki is a system for handling logs (unstructured data) but is lighter-weight than Elasticsearch. It also has fewer add-ons. But if you’re already using Prometheus and Grafana and you want to do it yourself, it can be a better solution.

Installation

Install Loki and Promtail together. These are available in the debian stable repos at current version. No need to go to backports or testing

sudo apt install loki promtail
curl localhost:3100/metrics

Configuration

Default config files are create in /etc/loki and /etc/promtail. Promtail is tailing /var/log/*log file, pushing them to localhost loki on the default port (3100) and loki is saving data in the /tmp directory. This is fine for testing.

Promtail runs as the promtail user (not root) and can’t read anything useful, so add them to the adm group.

sudo usermod -a -G adm promtail
sudo systemctl restart promtail

Grafana Integration

In grafana, add a datasource.

Configuration –> Add new data source –> Loki

Set the URL to http://localhost:3100

Then view the logs

Explore –> Select label (filename) –> Select value (daemon)

Troubleshooting

error notifying frontend about finished query

Edit the timeout setting in your loki datasource. The default may be too short so set it to 30s or some such

Failed to load log volume for this query

If you added a logfmt parser like the gui suggested, you may find not all your entries can be parsed, leading to this error.:w

4.3.3 - Network Traffic

Recoding traffic on the network is critical for troubleshooting and compliance. For the latter, the most common strategy is to record the “flows”. These are the connections each host makes or accepts, and how much data is involved.

You can collect this information at the LAN on individual switches, but the WAN (at the router) is usually more important. And if the router is performing NAT, it’s the only place to record the mappings of internal to external IPs and ports.

Network Log System

A network flow log system usually has three main parts.

Exporter --> Collector --> Analyzer

The Exporter, which records the data, the Collector, which is where the data is stored, and the Analyzer which makes the data more human-readable.

Example

We’ll use a Palo Alto NG Firewall as our exporter, and an Elasticsearch back-end. The data we are collecting is essentially log data, and Elasticsearch is probably the best at handling unstructured information.

At small scale, you can combine all of the the collection and analysis parts on a single system. We’ll use windows servers in our example as well.

graph LR

A(Palo)
B(Beats)
C(ElasticSearch)
D(Kibana) 

subgraph Exporter
A
end

subgraph Collector and Analyzer
B --> C --> D
end

A --> B

Installation

Start with Elasticsearch and Kibana, then install Beats.

Elastic Stack

Configuration

Beats and Palo have a couple of protocols in common. NetFlow is the traditional protocol, but when you’re using NAT the best choice is the syslog protocol as the Palo will directly tell you the NAT info all in one record and you don’t have to correlate multiple interface flows to see who did what.

Beats

On the Beats server, start an admin powershell session, change to the Filebeat directory, edit the config file and restart the server.

There is a bunch of example text in the config so tread carefully and keep in mind that indentation matters. Stick this block right under the filebeat.inputs: line and you should be OK.

This config stanza has a processor block that decodes the CVS content sent over in the message field, extracts a few select fields, then discards the rest. There’s quite a bit left over though, so see tuning below if you’d like to reduce the data load even more.

cd "C:\Program Files\Filebeat"
write.exe filebeat.yml

filebeat.inputs:
- type: syslog
  protocol.udp:
    host: ":9000"
  processors:
  - decode_csv_fields:
      fields:
        message: csv
  - extract_array:
      field: csv
      overwrite_keys: true
      omit_empty: true
      fail_on_error: false
      mappings:
        source.ip: 7
        destination.ip: 8
        source.nat.ip: 9
        network.application: 14
        source.port: 24
        destination.port: 25
        source.nat.port: 26
  - drop_fields:
        fields: ["csv", "message"]

A larger is example is under the beats documentation.

Palo Alto Setup

Perform steps 1 and 2 of the Palo setup guide with the notes below.

https://docs.paloaltonetworks.com/pan-os/10-0/pan-os-admin/monitoring/use-syslog-for-monitoring/configure-syslog-monitoring

In step 1 - The panw module defaults to 9001
In step 2 - Make sure to choose Traffic as the type of log

Tuning

You can reduce the amount of data even more by adding a few more Beats directives.

# At the very top level of the file, you can add this processor to affect global fields
processors:
- drop_fields:
   fields: ["agent.ephemeral_id","agent.id","agent.hostname","agent.type","agent.version","ecs.version","host.name"]

# You can also drop syslog fields that aren't that useful (you may need to put this under the syslog input)
- drop_fields:
    fields: ["event.severity","input.type","hostname","syslog.facility", "syslog.facility_label", "syslog.priority", "syslog.priority_label","syslog.severity_label"]

You may want even more data. See the Full Palo Syslog data to see what’s available. An example

Conclusion

At this point you can navigate to the Kibana web console and explore the logs. There is no dashboard as this is just for log retention and covers the minimum required. If you’re interested in more, check out the SIEM and Netflow dashboards Elasticsearch offers.

Sources

Palo Shipping

https://docs.logz.io/shipping/security-sources/palo-alto-networks.html

4.3.4 - NXLog

This info on NXLog is circa 2014 - use with caution.

NXLog is best used when Windows Event Forwarding can’t be and filebeats isn’t sufficient.

Background

There are several solutions for capturing logs in Windows, but NXLog has some advantages;

Cross-platform and Open Source
Captures windows events pre-parsed
Native windows installer and service

You could just run logstash everywhere. But in practice, Logstash’s memory requirements are several times NXLog and not everyone likes to install java everywhere.

Deploy on Windows

Download from http://nxlog.org/download. This will take you to the sourceforge site and the MSI you can install from. This installation is clean and the service installs automatically.

Configure on Windows

NXLog uses a config file with blocks in the basic pattern of:

Input Block
Output Block
Routing Block

The latter being what ties together your inputs and outputs. You start out with one variable, called the $raw_event with everything in it. As you call modules, that variable gets parsed out to more useful individual variables.

Event Viewer Example

Here’s an example of invoking the module that pulls in data from the windows event log entries associated.

Navigate to C:\Program Files (x86)\nxlog\conf
Edit the security settings on the file nxlog.conf. Change the ‘Users’ to have modify rights. This allows you to actually edit the config file.
Open that file in notepad and simply change it to look like so

    # Set the ROOT to the folder your nxlog was installed into
    define ROOT C:\Program Files (x86)\nxlog

    ## Default required locations based on the above
    Moduledir %ROOT%\modules
    CacheDir %ROOT%\data
    Pidfile %ROOT%\data\nxlog.pid
    SpoolDir %ROOT%\data
    LogFile %ROOT%\data\nxlog.log

    # Increase to DEBUG if needed for diagnosis
    LogLevel INFO

    # Input the windows event logs
    <Input in>
      Module      im_msvistalog
    </Input>


    # Output the logs to a file for testing
    <Output out>
        Module      om_file
        File        "C:/Program Files (x86)/nxlog/data/log-test-output.txt"
    </Output>

    # Define the route by mapping the input to an output
    <Route 1>
        Path        in => out
    </Route>

With any luck, you’ve now got some lines in your output file.

File Input Example

    # Set the ROOT to the folder your nxlog was installed into
    define ROOT C:\Program Files (x86)\nxlog

    ## Default required locations based on the above
    Moduledir %ROOT%\modules
    CacheDir %ROOT%\data
    Pidfile %ROOT%\data\nxlog.pid
    SpoolDir %ROOT%\data
    LogFile %ROOT%\data\nxlog.log

    # Increase to DEBUG if needed for diagnosis
    LogLevel INFO

    # Input a test file 
    <Input in>
        Module      im_file
        File ""C:/Program Files (x86)/nxlog/data/test-in.txt"
        SavePos     FALSE   
        ReadFromLast FALSE
    </Input>

    # Output the logs to a file for testing
    <Output out>
        Module      om_file
        File        "C:/Program Files (x86)/nxlog/data/log-test-output.txt"
    </Output>

    # Define the route by mapping the input to an output
    <Route 1>
        Path        in => out
    </Route>

Sending Events to a Remote Logstash Receiver

To be useful, you need to send your logs somewhere. Here’s an example of sending them to a Logstash receiver.

    # Set the ROOT to the folder your nxlog was installed into
    define ROOT C:\Program Files (x86)\nxlog

    ## Default required locations based on the above
    Moduledir %ROOT%\modules
    CacheDir %ROOT%\data
    Pidfile %ROOT%\data\nxlog.pid
    SpoolDir %ROOT%\data
    LogFile %ROOT%\data\nxlog.log

    # Increase to DEBUG if needed for diagnosis
    LogLevel INFO

    # Load the JSON module needed by the output module
    <Extension json>
        Module      xm_json
    </Extension>

    # Input the windows event logs
    <Input in>
      Module      im_msvistalog
    </Input>


    # Output the logs out using the TCP module, convert to JSON format (important)
    <Output out>
        Module      om_tcp
        Host        some.server
        Port        6379
        Exec to_json();
    </Output>

    # Define the route by mapping the input to an output
    <Route 1>
        Path        in => out
    </Route>

    Restart the service in the windows services, and you are in business.

Note about JSON

You’re probably shipping logs to a logstash broker (or similar json based tcp receiver). In that case, make sure to specify JSON on the way out, as in the example above or you’ll spend hours trying to figure out why you’re getting a glob of plain txt and loose all the pre-parsed windows event messages which are nearly impossible to parse back from plain text.

Using that to_json() will replace the contents. The variable we mentioned earlier, $raw_event, with all of the already parsed fields. If you hand’t invoked a module to parse that data out, you’d just get a bunch of empty events as the data was replaced with a bunch of nothing.

4.3.4.1 - Drop Events

Exec

You can use the ‘Exec’ statement in any block and some pattern matching to drop events you don’t care about.

<Input in>
  Module im_file
  File "E:/Imports/get_accessplans/log-test.txt"
  Exec if $raw_event =~ /someThing/ drop();
</Input>

Or the inverse, with the operator !~

Dropping Events with pm_pattern

The alternative is the patternDB approach as it has some parallelization advantages you’ll read about in the docs should you dig into it further. This matters when you have lots of patterns to check against.

# Set the ROOT to the folder your nxlog was installed into
define ROOT C:\Program Files (x86)\nxlog

## Default required locations based on the above
Moduledir %ROOT%\modules
CacheDir %ROOT%\data
Pidfile %ROOT%\data\nxlog.pid
SpoolDir %ROOT%\data
LogFile %ROOT%\data\nxlog.log

# Increase to DEBUG if needed for diagnosis
LogLevel INFO

# Load the JSON module needed by the output module
<Extension json>
    Module      xm_json
</Extension>

# Input the windows event logs
<Input in>
  Module      im_msvistalog
</Input>

# Process log events 
<Processor pattern>
  Module  pm_pattern
  PatternFile %ROOT%/conf/patterndb.xml
</Processor>

# Output the logs out using the TCP module, convert to JSON format (important)
<Output out>
    Module      om_tcp
    Host        some.server
    Port        6379
    Exec to_json();
</Output>

# Define the route by mapping the input to an output
<Route 1>
    Path        in => pattern => out
</Route>

And create an XML file like so:

<?xml version="1.0" encoding="UTF-8"?>
<patterndb>

  <group>
    <name>eventlog</name>
    <id>1</id>

   <pattern>
      <id>2</id>
      <name>500s not needed</name> 
      <matchfield>
        <name>EventID</name>
        <type>exact</type>
        <value>500</value>
      </matchfield>
      <exec>drop();</exec>
    </pattern>

    
  </group>

</patterndb>

4.3.4.2 - Event Log

Limiting Log Messages

You may not want ALL the event logs. You can add a query to that module however, and limit logs to the security logs, like so

<Input in>
  Module im_msvistalog
  Query <QueryList><Query Id="0" Path="Security"><Select Path="Security">*</Select></Query></QueryList>
</Input>

You can break that into multiple lines for easier reading by escaping the returns. Here’s an example that ships the ADFS Admin logs.

<Input in>
  Module im_msvistalog
  Query <QueryList>\
            <Query Id="0">\
                <Select Path='AD FS 2.0/Admin'>*</Select>\
            </Query>\
        </QueryList>
</Input>

Pulling out Custom Logs

If you’re interested in very specific logs, you can create a custom view in the windows event viewer, and after selecting the criteria in with the graphical tool, click on the XML tab to see what the query is. For example, to ship all the ADFS 2 logs (assuming you’ve turned on auditing) Take the output of the XML tab (shown below) and modify to be compliant with the nxlog format.

<QueryList>
  <Query Id="0" Path="AD FS 2.0 Tracing/Debug">
    <Select Path="AD FS 2.0 Tracing/Debug">*[System[Provider[@Name='AD FS 2.0' or @Name='AD FS 2.0 Auditing' or @Name='AD FS 2.0 Tracing']]]</Select>
    <Select Path="AD FS 2.0/Admin">*[System[Provider[@Name='AD FS 2.0' or @Name='AD FS 2.0 Auditing' or @Name='AD FS 2.0 Tracing']]]</Select>
    <Select Path="Security">*[System[Provider[@Name='AD FS 2.0' or @Name='AD FS 2.0 Auditing' or @Name='AD FS 2.0 Tracing']]]</Select>
  </Query>
</QueryList

Here’s the query from a MS NPS

<QueryList>
  <Query Id="0" Path="System">
    <Select Path="System">*[System[Provider[@Name='NPS']]]</Select>
    <Select Path="System">*[System[Provider[@Name='HRA']]]</Select>
    <Select Path="System">*[System[Provider[@Name='Microsoft-Windows-HCAP']]]</Select>
    <Select Path="System">*[System[Provider[@Name='RemoteAccess']]]</Select>
    <Select Path="Security">*[System[Provider[@Name='Microsoft-Windows-Security-Auditing'] and Task = 12552]]</Select>
  </Query>
</QueryList>

4.3.4.3 - Input File Rotation

NXLog has decent ability to rotate it’s own output files, but it’s doesn’t come with a lot of methods to rotate input files - i.e. your reading in Accounting logs from a windows RADIUS and it would be nice to archive those with NXLog, because Windows won’t do it. You could bust out some perl (if you’re on unix) and use the xm_perl module, but there’s a simpler way.

On windows, the solution is to use an exec block with a scheduled command. The forfiles executable is already present in windows and does the trick. The only gotcha is that ALL the parameters must be delimited like below.

So the command

forfiles /P "E:\IAS_Logs" /D -1 /C "cmd /c move @file \\server\share"

Becomes

<Extension exec>
  Module xm_exec
  <Schedule>
    When @daily
 Exec exec('C:\Windows\System32\forfiles.exe','/P','"E:\IAS_Logs"','/D','-1','/C','"cmd','/c','move','@file','\\server\share"');
  </Schedule>
</Extension>

A slightly more complex example with added compression and removal of old files (there isn’t a great command line zip utility for windows in advance of powershell 5)

# Add log rotation for the windows input files
<Extension exec>
    Module xm_exec
    <Schedule>
     When @daily
         # Make a compressed copy of .log files older than 1 day
         Exec exec('C:\Windows\System32\forfiles.exe','/P','"E:\IAS_Logs"','/M','*.log','/D','-1','/C','"cmd','/c','makecab','@file"')
         # Delete original files after 2 days, leaving the compressed copies 
         Exec exec('C:\Windows\System32\forfiles.exe','/P','"E:\IAS_Logs"','/M','*.log','/D','-2','/C','"cmd','/c','del','@file"')
         # Move compressed files to the depot after 2 days
         Exec exec('C:\Windows\System32\forfiles.exe','/P','"E:\IAS_Logs"','/M','*.lo_','/D','-2','/C','"cmd','/c','move','@file','\\shared.ohio.edu\appshare\radius\logs\radius1.oit.ohio.edu"');
    </Schedule>
</Extension>

The @daily runs right at 0 0 0 0 0 (midnight every night). Check the manual for more precise cron controls

4.3.4.4 - Inverse Matching

You can use the ‘Exec’ statement to match inverse like so

<Input in>
  Module im_file
  File "E:/Imports/get_accessplans/log-test.txt"
  Exec if $raw_event !~ /someThing/ drop();
</Input>

However, when you’re using a pattern db this is harder as the REGEXP doesn’t seem to honor inverses like you’d expect. Instead, you must look for matches in your pattern db like normal;

<?xml version="1.0" encoding="UTF-8"?>
<patterndb>

  <group>
    <name>eventlog</name>
    <id>1</id>

    <pattern>
      <id>2</id>
      <name>Identify user login success usernames</name>

      <matchfield>
        <name>EventID</name>
        <type>exact</type>
        <value>501</value>
      </matchfield>

      <matchfield>
        <name>Message</name>
        <type>REGEXP</type>
        <value>windowsaccountname \r\n(\S+)</value>
        <capturedfield>
          <name>ADFSLoginSuccessID</name>
          <type>STRING</type>
        </capturedfield>
      </matchfield

   </pattern>
  </group>
</patterndb>

Then, add a section to your nxlog.conf to take action when the above capture field doesn’t existing (meaning there wasn’t a regexp match).

...

# Process log events 
<Processor pattern>
  Module  pm_pattern
  PatternFile %ROOT%/conf/patterndb.xml
</Processor>

# Using a null processor just to have a place to put the exec statement
<Processor filter>
 Module pm_null
 Exec if (($EventID == 501) and ($ADFSLoginSucccessID == undef)) drop();
</Processor>

# Output the logs out using the TCP module, convert to JSON format (important)
<Output out>
    Module      om_tcp
    Host        some.server
    Port        6379
    Exec to_json();
</Output>

# Define the route by mapping the input to an output
<Route 1>
    Path        in => pattern => filter => out
</Route>

4.3.4.5 - Logstash Broker

When using logstash as a Broker/Parser to receive events from nxlog, you’ll need to explicitly tell it that the message is in json format with a filter, like so:

input {
  tcp {
    port => 6379
    type => "WindowsEventLog"
  }
}
filter {
  json {
    source => message
  }
}
output {
  stdout { codec => rubydebug }
}

4.3.4.6 - Manipulating Data

Core Fields

NXLog makes and handful of attributes about the event available to you. Some of these are from the ‘core’ module

$raw_event
$EventReceivedTime
$SourceModuleName
$SourceModuleType

Additional Fields

These are always present and added to by the input module or processing module you use. For example, the mseventlog module adds all the attributes from the windows event logs as attributes to the nxlog event. So your event contains:

$raw_event
$EventReceivedTime
$SourceModuleName
$SourceModuleType
$Message
$EventTime
$Hostname
$SourceName
$EventID
...

You can also create new attributes by using a processing module, such as parsing an input file’s XML. This will translate all the tags (within limites) into attributes.

<Extension xml>
    Module xm_xml
</Extension>
<Input IAS_Accounting_Logs>
    Module im_file
    File "E:\IAS_Logs\IN*.log"
    Exec parse_xml();
</Input>

And you can also add an Exec at any point to create or replace new attribute as desired

<Input IAS_Accounting_Logs>
    Module im_file
    File "E:\IAS_Logs\IN*.log"
    Exec $type = "RADIUSAccounting";
</Input>

Rewriting Data

Rather than manipulate everything in the input and output modules, use the pm_null module to group a block together.

<Processor rewrite>
    Module pm_null
    Exec parse_syslog_bsd();\
 if $Message =~ /error/ \
                {\
                  $SeverityValue = syslog_severity_value("error");\
                  to_syslog_bsd(); \
                }
</Processor>


<Route 1>
    Path in => rewrite => fileout
</Route>

4.3.4.7 - NPS Example

define ROOT C:\Program Files (x86)\nxlog

Moduledir %ROOT%\modules
CacheDir %ROOT%\data
Pidfile %ROOT%\data\nxlog.pid
SpoolDir %ROOT%\data
LogFile %ROOT%\data\nxlog.log


# Load the modules needed by the outputs
<Extension json>
    Module      xm_json
</Extension>

<Extension xml>
    Module xm_xml
</Extension>


# Inputs. Add the field '$type' so the receiver can easily tell what type they are.
<Input IAS_Event_Logs>
    Module      im_msvistalog
    Query \
 <QueryList>\
 <Query Id="0" Path="System">\
 <Select Path="System">*[System[Provider[@Name='NPS']]]</Select>\
 <Select Path="System">*[System[Provider[@Name='HRA']]]</Select>\
  <Select Path="System">*[System[Provider[@Name='Microsoft-Windows-HCAP']]]</Select>\
 <Select Path="System">*[System[Provider[@Name='RemoteAccess']]]</Select>\
 <Select Path="Security">*[System[Provider[@Name='Microsoft-Windows-Security-Auditing'] and Task = 12552]]</Select>\
 </Query>\
 </QueryList>
    Exec $type = "RADIUS";
</Input>

<Input IAS_Accounting_Logs>
    Module      im_file
    File  "E:\IAS_Logs\IN*.log"
    Exec parse_xml();
    Exec $type = "RADIUSAccounting";
</Input>


# Output the logs out using the TCP module, convert to JSON format (important)
<Output broker>
    Module      om_tcp
    Host        192.168.1.1
    Port        8899
    Exec to_json();
</Output>


# Routes
<Route 1>
    Path        IAS_Event_Logs,IAS_Accounting_Logs => broker
</Route>


# Rotate the input logs while we're at it, so we don't need a separate tool
<Extension exec>
    Module xm_exec
    <Schedule>
     When @daily
      #Note -  the Exec statement is one line but may appear wrapped
                Exec exec('C:\Windows\System32\forfiles.exe','/P','"E:\IAS_Logs"','/D','-1','/C','"cmd','/c','move','@file','\\some.windows.server\share\logs\radius1"');
    </Schedule>
</Extension>

4.3.4.8 - Parsing

You can also extract and set values with a pattern_db, like this; (Note, nxlog uses perl pattern matching syntax if you need to look things up)

<?xml version="1.0" encoding="UTF-8"?>
<patterndb>

  <group>
    <name>ADFS Logs</name>
    <id>1</id>

   <pattern>
      <id>2</id>
      <name>Identify user login fails</name>
      <matchfield>
        <name>EventID</name>
        <type>exact</type>
        <value>111</value>
      </matchfield>

      <matchfield>
        <name>Message</name>
        <type>REGEXP</type>
        <value>LogonUser failed for the '(\S+)'</value>
        <capturedfield>
          <name>ADFSLoginFailUsername</name>
          <type>STRING</type>
        </capturedfield>
      </matchfield>

      <set>
        <field>
          <name>ADFSLoginFail</name>
          <value>failure</value>
          <type>string</type>
        </field>
      </set>
    </pattern>

And a more complex example, where we’re matching against a sting like:

2015-03-03T19:45:03 get_records 58  DailyAddAcct completed (Success) with: 15 Records Processed 0 adds 0 removes 0 modified 15 unchanged 


<?xml version="1.0" encoding="UTF-8"?>
<patterndb>

  <group>
    <name>Bbts Logs</name>
    <id>1</id>
  
    <pattern>
      <id>2</id>
      <name>Get TS Records</name> 
 
      <matchfield>
        <name>raw_event</name>
        <type>REGEXP</type>
        <value>^(\S+) get_record (\S+)\s+(\S+) completed \((\S+)\) with: (\S+) Records Processed (\S+) adds (\S+) removes (\S+) modified (\S+) unchanged</value>
 
        <capturedfield>
          <name>timestamp</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Transaction_ID</name>
          <type>STRING</type>
        </capturedfield>

         <capturedfield>
          <name>Job_Subtype</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Job_Status</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Record_Total</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Record_Add</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Record_Remove</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Record_Mod</name>
          <type>STRING</type>
        </capturedfield>

        <capturedfield>
          <name>Record_NoChange</name>
          <type>STRING</type>
        </capturedfield>

      </matchfield>

      <set>
        <field>
          <name>Job_Type</name>
          <value>Get_Records</value>
          <type>string</type>
        </field>
      </set>
    </pattern>


  </group>
</patterndb>

4.3.4.9 - Reprocessing

Sometimes you have a parse error when you’re testing and you need to feed all your source files back in. Problem is you’re usually saving position and reading only new entries by default.

Defeat this by adding a line to the nxlog config so it starts reading files at the beginning and deleting the ConfigCache file (so there’s no last position to start from).

    <Input IAS_Accounting_Logs>
        Module      im_file
        ReadFromLast FALSE
        File  "E:\IAS_Logs\IN*.log"
        Exec parse_xml();
        Exec $type = "RADIUSAccounting";
    </Input>

del C:\Program Files (x86)\nxlog\data\configcache.dat

Restart and it will begin reprocessing all the data. When you’re done, remove the ReadFromLast line and restart.

Note: If you had just deleted the cache file, nxlog would have resumed at the tail of the file. You could have told it not to save position, but you actually do want that for when you’re ready to resume normal operation.

https://www.mail-archive.com/[email protected]/msg00158.html

4.3.4.10 - Syslog

There are two components; adding the syslog module and adding the export path.

    <Extension syslog>
        Module xm_syslog
    </Extension>

    <Input IAS_Accounting_Logs>
        Module      im_file
        File  "E:\IAS_Logs\IN*.log"
        Exec $type = "RADIUSAccounting";
    </Input>

    <Output siem>
    Module om_udp
    Host 192.168.1.1
    Port 514
    Exec to_syslog_ietf();

    </Output>

    <Route 1>
        Path        IAS_Accounting_Logs => siem
    </Route>

4.3.4.11 - Troubleshooting

NXLOG Couldn’t read next event

If you see this error message from nxlog:

ERROR Couldn't read next event, corrupted eventlog?; The data is invalid.

Congrats - you’ve hit a bug.

https://nxlog.org/support-tickets/immsvistalog-maximum-event-log-count-support

The work-around is to limit your log event subscriptions on the input side by using a query. Example:

<Input in>
  Module im_msvistalog
  Query <QueryList><Query Id="0" Path="Microsoft-Windows-PrintService/Operational"><Select Path="Microsoft-Windows-PrintService/Operational">*</Select></Query></QueryList>
  Exec if $EventID != 307 drop();
  Exec $type = "IDWorks";
</Input>

Parse failure on windows to logstash

We found that nxlog made for the best windows log-shipper. But it didn’t seem to parse the events in the event log. Output to logstash seemed not to be in json format, and we confirmed this by writing directly to disk. This happens even though the event log input module explicitly emits the log attributes atomically.

Turns out you have to explicitly tell the output module to use json. This isn’t well documented.

4.3.4.12 - UNC Paths

When using Windows UNC paths, don’t forget that the backslash is also used for escaping characters, so the path

    \\server\radius

looks like

    \\server;adius

in your error log message. You’ll want to escape your back slashes like this;

\\\\server\\radius\\file.log

4.3.4.13 - Unicode Normalization

Files you’re reading may be any character set and this can cause strange things when you modify or pass the data on, as an example at stack exchange shows. This isn’t a problem with windows event logs, but windows applications use several different types of charsets.

Best practice is to convert everything to UTF-8. This is especially true when invoking modules such as json, that don’t handle other codes well.

NXLog has the ability to convert and can even to this automatically. However, there is some room for error. If you can, identity what the encoding is by looking at it in a hex editor and comparing to MS’s identification chart.

Here’s an snippet of a manual conversion of a powershell generated log. Having looked at the first part and identified it as UTF-16LE

...
<Extension charconv>
    Module xm_charconv
    AutodetectCharsets utf-8, utf-16, utf-32, iso8859-2, ucs-2le
</Extension>

<Input in1>
    Module      im_file
    File "E:/Imports/log.txt"
    Exec  $raw_event = convert($raw_event,"UTF-16LE","UTF-8");
</Input>
...

Notice however that the charconv module has an automatic directive. You can use that as long as what you have is included as marked in bold here.

<Extension charconv>
    Module xm_charconv
    AutodetectCharsets utf-8, utf-16, utf-16le, utf-32, iso8859-2
</Extension>


<Input sql-ERlogs>
    Module      im_file
    File 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL\MSSQL\Log\ER*'
    ReadFromLast TRUE
    Exec        convert_fields("AUTO", "utf-8");
</Input>

If you’re curious what charsets are supported, you can type this command in any unix system to see the names.

iconv -i

4.3.4.14 - Windows Files

Windows uses UTF-16 by default. Other services may use derivations thereof. In any event, it’s recommended to normalize things to UTF-8. Here’s a good example of what will happen if you don’t;

<http://stackoverflow.com/questions/27596676/nxlog-logs-are-in-unicode-charecters>

The answer to that question is to use the specific code field, as “AUTO” doesn’t seem to detect properly.

<Input in>
    Module      im_file
    File "E:/Imports/get_accessplans/log-test.txt"
    Exec if $raw_event == '' drop(); 
    Exec $Event = convert($raw_event,"UCS-2LE","UTF-8"); to_json();
    SavePos     FALSE   
    ReadFromLast FALSE
</Input>

From the manual on SQL Server

Microsoft SQL Server

Microsoft SQL Server stores its logs in UTF-16 encoding using a line-based format.
It is recommended to normalize the encoding to UTF-8. The following config snipped
will do that.

<Extension _charconv>
    Module xm_charconv
</Extension>

<Input in>
    Module im_file
    File "C:\\MSSQL\\ERRORLOG"
    Exec convert_fields('UCS-2LE','UTF-8'); if $raw_event == '' drop();
</Input>

As of this writing, the LineBased parser, the default InputType for im_file is not able to properly read the double-byte UTF-16 encoded files and will read an additional empty line (because of the double-byte CRLF). The above drop() call is intended to fix this.

convert_fields('UTF-16','UTF-8'); might also work instead of UCS-2LE.

4.3.5 - Windows Event Forwarding

If you’re in a Windows shop, this is the best way to keep the Windows admins happy. No installation of extra tools. ‘Keeps it in the MS family’ so to speak.

Configure your servers to push¹ logs to a cental location and use a client there, to send it on. Beats works well for this.

The key seems to be

Create a domain service account or add the machine account
add that to the group on the client

check the runtime status on the collector

For printing, in Event Viewer navigate to Microsoft-Windows-PrintService/Operational and enable it as its not on by default.

Make sure to enable for latency or you’ll spend a long time wondering why there is no data.

Sources

https://hackernoon.com/the-windows-event-forwarding-survival-guide-2010db7a68c4 https://www.ibm.com/docs/en/netcoolomnibus/8?topic=acquisition-forwarded-event-log https://www.youtube.com/watch?v=oyPuRE51k3o&t=158s

https://learn.microsoft.com/en-us/windows/security/operating-system-security/device-management/use-windows-event-forwarding-to-assist-in-intrusion-detection#is-wef-push-or-pull ↩︎