This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

This is the documentation root. Use the left-hand nav bar to descend taxonomically, or use the search to find what you are after.

1 - Email

Email is a commodity service, but critical for many things - so you can get it anywhere, but you better not mess it up.

Your options, in increasing order of complexity, are:

Forwarding

Email sent to [email protected] is simply forwarded to someplace like gmail. It’s free and easy, and you don’t need any infrastructure. Most registrars like GoDaddy, NameCheap, CloudFlare, etc, will handle it.

You can even reply from [email protected] by integrating with SendGrid or a similar provider.

Remote-Hosting

If you want more, Google and Microsoft have full productivity suites. Just edit your DNS records, import your users, and pay them $5 a head per month. You still have to ‘do email’ but it’s a little less work than if you ran the whole stack. In most cases, companies that specialize in email do it better than you can.

Self-Hosting

If you are considering local email, let me paraphrase Kenji López-Alt. The first step is, don’t. The big guys can do it cheaper and better. But if it’s a philosophical, control, or you just don’t have the funding, press on.

A Note About Cost

Most of the cost is user support. Hosting means someone else gets purchase and patch a server farm, but you still have to talk to users. My (anecdotal) observation is that fully hosting saves 10% in overall costs and it soothes out expenses. The more users you have, the more that 10% starts to matter.

1.1 - Forwarding

This is the best solution for a small number of users. You configure it at your registrar and rely on google (or someone similar) to do all the work for free.

If you want your out-bound emails to come from your domain name (and you do), add an out-bound relay. This is also free for minimal use.

Registrar Configuration

This is different per registrar, but normally involves creating an address and it’s destination

Cloudflare

  • (Login - assumes you use cloudflare as your registrar)
  • Login and select the domain in question.
  • Select Email, then Email Routing.
  • Under Routes, select Create address.

Once validated, email will begin arriving at the destination.

Configure Relaying

The registrars is only forwarding email, not sending it. To get your sent mail to from from your domain, you must integrate with a mail service such as SendGrid

SendGrid

  • Create a free account and login
  • Authenticate your domain name (via DNS)
  • Create an API key (Settings -> API Keys -> Restricted Access, Defaults)

Gmail

  • Settings -> Accounts -> Send Mail as
  • Add your domain email
  • Configure the SMTP server with:
    • SMTP server: “smtp.sendgrid.net”
    • username: “apikey”
    • password: (the key you created above)

After validating the code Gmail sends you, there will be a drop down in the From field of new emails.

1.2 - Remote Hosting

This is more in the software-as-a-service category. You get an admin dashboard and are responsible for managing users and mail flow. The hosting service provide will help you with basic things, but you’re doing most of the work yourself.

Having manged 100K+ user mail systems and migrated from on-prem sendmail to exchange and then O365 and Google, I can confidently say the infrastructure and even platform amounts to less than 10% of the cost of providing the service.

The main advantage to hosting is that you’re not managing the platform, installing patches and replacing hardware. The main disadvantage is is that you have little control and sometimes things are broken and you can’t do anything about it.

Medium sized organizations benefit most from hosting. You probably need a productivity suite anyways, and email is usually wrapped up in that. It saves you from having to specialize someone in email and the infrastructure associated with it.

But if controlling access to your data is paramount, then be aware that you have lost that and treat email as a public conversation.

1.3 - Self Hosting

When you self-host, you develop expertise in email itself, arguably a commodity service where such expertise has small return. But, you have full control and your data is your own.

The generally accepted best practice is install Postfix and Dovecot. This is the simplest path and what I cover here. But there are some pretty decent all-in-one packages such as Mailu, Modoboa, etc. These usually wrap Postfix and Dovecot to spare you the details and improve your quality of life, at the cost of not really knowing how they really work.

You’ll also need to configure a relay. Many ISPs block basic mail protocol and many recipient servers are rightly suspicious of random emails from unknown IPs in cable modem land.

  1. Postfix
  2. Dovecot
  3. Relay

1.3.1 - Postfix

This is the first step - a server that accepts and sends mail. After installing, you’ll be able to check messages at the console. Remote client access (such as with Thunderbird) comes later.

Preparation

You need:

  • Linux Server
  • Firewall Port-Forward
  • Public DNS

Server

We use Debian Bookworm (12) in this example but any derivative will be similar. At large scale you’d setup virtual users, but we’ll stick with the default setup and use your system account. Budget about 10M per 100 emails stored.

Port Forwarding

Mail protocol uses port 25. Simply forward that to your internal mail server and you’re done.

DNS

You need an entry for your server so mail sent directly to it ([email protected]) will arrive. You should also create a special entry for your domain root. That way, mail sent to [email protected] will get routed to it as well. Take advantage of dynamic DNS when possible.

Name Type Value
mail A 20.236.44.162
@ MX mail

Installation

Some configuration is done at install time by the package so you must make sure your hostname is correct. We use the hostname ‘mail’ in this example.

# Correct internal hostnames as needed. 'mail' and 'mail.home.lan' are good suggestions.
cat /etc/hostname /etc/hosts

# Set the external host name and run the package installer. If postfix is already installed, apt remove it first
EXTERNAL="mail.your.org"
sudo debconf-set-selections <<< "postfix postfix/mailname string $EXTERNAL"
sudo debconf-set-selections <<< "postfix postfix/main_mailer_type string 'Internet Site'"
sudo apt install --assume-yes postfix

# Add the main domain to the destinations as well
DOMAIN="your.org"
sudo sed -i "s/^mydestination = \(.*\)/mydestination = $DOMAIN, \1/"  /etc/postfix/main.cf
sudo systemctl reload postfix.service

Test with telnet - use your unix system ID for the rcpt address below.

telnet localhost 25
ehlo localhost
mail from: <[email protected]>
rcpt to: <[email protected]>
data
Subject: Wish List

Red Ryder BB Gun
.
quit

Assuming that ‘you’ matches your shell account, Postfix will have accepted the message and used it’s Local Delivery Agent to store it in the local message store. That’s in /var/mail.

cat /var/mail/YOU 

Configuration

Encryption

Postfix will use the untrusted “snakeoil” that comes with debian to opportunistically encrypt communication between it and other mail servers. Surprisingly, most other servers will accept this cert (or fall back to non-encrypted), so lets proceed for now. We’ll generate a trusted one later.

Spam Protection

The default config is secured so that it won’t relay messages, but it will accept message from Santa, and is subject to backscatter and a few other things. Let’s tighten it up.

sudo tee -a /etc/postfix/main.cf << EOF

# Tighten up formatting
smtpd_helo_required = yes
disable_vrfy_command = yes
strict_rfc821_envelopes = yes

# Error codes instead of bounces
invalid_hostname_reject_code = 554
multi_recipient_bounce_reject_code = 554
non_fqdn_reject_code = 554
relay_domains_reject_code = 554
unknown_address_reject_code = 554
unknown_client_reject_code = 554
unknown_hostname_reject_code = 554
unknown_local_recipient_reject_code = 554
unknown_relay_recipient_reject_code = 554
unknown_virtual_alias_reject_code = 554
unknown_virtual_mailbox_reject_code = 554
unverified_recipient_reject_code = 554
unverified_sender_reject_code = 554
EOF

sudo systemctl reload postfix.service

PostFix has some recommendations as well.

sudo tee -a /etc/postfix/main.cf << EOF

# PostFix Suggestions
smtpd_helo_restrictions = reject_unknown_helo_hostname
smtpd_sender_restrictions = reject_unknown_sender_domain
smtpd_recipient_restrictions =
    permit_mynetworks, 
    permit_sasl_authenticated,
    reject_unauth_destination,
    reject_rbl_client zen.spamhaus.org,
    reject_rhsbl_reverse_client dbl.spamhaus.org,
    reject_rhsbl_helo dbl.spamhaus.org,
    reject_rhsbl_sender dbl.spamhaus.org
smtpd_relay_restrictions = 
    permit_mynetworks, 
    permit_sasl_authenticated,
    reject_unauth_destination
smtpd_data_restrictions = reject_unauth_pipelining
EOF

sudo systemctl reload postfix.service

If you test a message from Santa now, Postfix will do some checks and realize it’s bogus.

550 5.7.27 [email protected]: Sender address rejected: Domain northpole.org does not accept mail (nullMX)

Header Cleanup

Postfix will attach a Received: header to outgoing emails that has details of your internal network and mail client. That’s information you don’t need to broadcast. You can remove that with a “cleanup” step as the message is sent.

# Insert a header check after the 'cleanup' line in the smtp section of the master file and create a header_checks file
sudo sed -i '/^cleanup.*/a\\t-o header_checks=regexp:/etc/postfix/header_checks' /etc/postfix/master.cf
echo "/^Received:/ IGNORE" | sudo tee -a /etc/postfix/header_checks

Note - there is some debate on if this triggers a higher spam score. You may want to replace instead.

Testing

Incoming

You can now receive mail to [email protected] and [email protected]. Try this to make sure you’re getting messages. Feel free to install mutt if you’d like a better client at the console.

Outgoing

You usually can’t send mail. Many ISPs block outgoing port 25 to keep a lid on spam bots. This prevents you from sending any messages. You can test that by trying to connect to gmail on port 25 from your server.

nc -zv gmail-smtp-in.l.google.com 25

Also, many mail servers will reverse-lookup your IP to see who it belongs to. That request will go to your ISP (who owns the IPs) and show their DNS name instead of yours. You’re often blocked at this step, though some providers will work with you. Put in a ticket to find out.

Even if you’re not blocked and your ISP has given you a static IP with a matching reverse-lookup, you will suffer from a lower reputation score as you’re not a well-known email provider. This can cause your sent messages to be delayed while being considered for spam.

To solve these issues, relay your email though a email provider. This will improve your reputation score (used to judge spam), ease the additional security layers such as SPF, DKIM, DMARC, and is usually free at small volume.

Postfix even calls this using a ‘Smarthost’

Next Steps

Now that you can get email, let’s make it so you can also send it.

Troubleshooting

When adding Postfix’s anti-spam suggestions, we left off the smtpd_client_restrictions and smtpd_end_of_data_restrictions as they created problems during testing.

You may get a warning from Postfix that one of the settings you’ve added is overriding one of the earlier settings. Simply delete the first instance. These are usually default settings that we’re overriding.

Misc

Mail Addresses

Postfix only accepts messages for users in the “local recipient table” which is built from the unix password file and the aliases file1. You can add aliases for other addresses that will deliver to your shell account, but only shell users can receive mail right now. See virtual mailboxes to add users without shell accounts.

In the alias file, you’ll see “Postmaster” (and possibly others) are aliased to root. Add root as an alias to you at the bottom so that mail gets to your mailbox.

echo "root:   $USER" | sudo tee -a /etc/aliases
sudo newaliases

1.3.2 - Relay

A relay is simply another mail server that you give your outgoing mail to, rather than try to deliver it yourself.

There are many companies that specialize in this. Sign up for a free account and they give you the block of text to add to your postfix config. Some popular ones are:

  • SendGrid
  • MailGun
  • Sendinblue

They allow anywhere between 50 and 300 a day for free.

SendGrid

Relay Setup

SendGrid’s free plan gives you 50 emails a day. Create an account, verify your email address ([email protected]), and follow the instructions. Make sure to sudo apt install libsasl2-modules

https://docs.sendgrid.com/for-developers/sending-email/postfix

Restart Postfix and use mutt to send an email. It works! the only thing you’ll notice is that your message has a “On Behalf Of” notice in the message letting you know it came from SendGrid. Follow the section below to change that.

Domain Integration

To integrate your domain fully, add DNS records for SendGrid using these instructions.

https://docs.sendgrid.com/ui/account-and-settings/how-to-set-up-domain-authentication

This will require you to login and go to:

  • Settings -> Sender Authentication -> Domain Authentication

Stick with the defaults that include automatic security and SendGrid will give you three CNAME records. Add those to your DNS and your email will check out.

Technical Notes

DNS

If you’re familiar with email domain-based security, you’ll see that two of the records SendGrid gives you are links to DKIM keys so SendGrid can sign emails as you. The other record (emXXXX) is the host sendgrid will use to send email. The SPF record for that host will include a SendGrid SPF record that includes multiple pools of IPs so that SPF checks will pass. They use CNAMEs on your side so they can rotate keys and pool addresses without changing DNS entries.

If none of this makes sense to you, then that’s really the point. You don’t have to know any of it - they take care of it for you.

Next Steps

Your server can now send email too. All shell users on your sever rejoice!

To actually use your mail server, you’ll want to add some remote client access.

1.3.3 - Dovecot

Dovecot is an IMAP (Internet Message Access Protocol) server that allows remote clients to access their mail. There are other protocols and servers, but Dovecot has about 75% of the internet and is a good choice.

Installation

sudo apt install dovecot-imapd
sudo apt install dovecot-submissiond

Configuration

Storage

Both Postfix and Dovecot use mbox storage format by default. This is one big file with all your mail in it and doesn’t scale well. Switch to the newer maildir format where your messages are stored as individual files.

# Change where Postfix delivers mail.
sudo postconf -e "home_mailbox = Maildir/"
sudo systemctl reload postfix.service

# Change where Dovecot looks for mail.
sudo sed -i 's/^mail_location.*/mail_location = maildir:~\/Maildir/' /etc/dovecot/conf.d/10-mail.conf
sudo systemctl reload dovecot.service

Encryption

Dovecot comes with it’s own default cert. This isn’t trusted, but Thunderbird will prompt you and you can choose to accept it. This will be fine for now. We’ll generate a valid cert later.

Credentials

Dovecot checks passwords against the local unix system by default and no changes are needed.

Submissions

One potential surprise is that IMAP is only for viewing existing mail. To send mail, you use the SMTP protocol and relay messages to your mail server. But we have relaying turned off, as we don’t want just anyone relaying messages.

The solution is to enable authentication and by convention this is done by a separate port process, called the Submission Server.

We’ve installed Dovecot’s submission server as it’s newer and easier to set up. Postfix even suggests considering it, rather than theirs. The only configuration needed it to set the localhost as the relay.

# Set the relay as localhost where postfix runs
sudo sed -i 's/#submission_relay_host =/submission_relay_host = localhost/' /etc/dovecot/conf.d/20-submission.conf
sudo systemctl reload dovecot.service

Port Forwarding

Forward ports 143 and 587 to your mail server and test that you can connect from both inside and outside your LAN.

nc -zf mail.your.org 143
nc -zf mail.your.org 587

If it’s working from outside your network, but not inside, you may need to enable [reflection] aka hairpin NAT. This will be different per firewall vendor, but in OPNSense it’s:

Firewall -> Settings -> Advanced

 # Enable these settings
Reflection for port forwards
Reflection for 1:1
Automatic outbound NAT for Reflection

Clients

Thunderbird and others will successfully discover the correct ports and services when you provide your email address of [email protected].

Notes

TLS

Dovecot defaults to port 587 for the submission service which is an older standard for explicit TLS. It’s now recommended by RFC to use implicit TLS on port 465 and you can add a new “submissions” service for that, while leaving the default in place. Clients will pick their fav. Thunderbird defaults to the 465 when both are available.

Note: leaving the default sumbission port commented out just means it will use the default port. Comment out the whole block to disable.

vi /etc/dovecot/conf.d/10-master.conf

# Change the default of

service submission-login {
  inet_listener submission {
    #port = 587
  }
}

to 

service submission-login {
  inet_listener submission {
    #port = 587
  }
  inet_listener submissions {
    port = 465
    ssl = yes
  }
}

# And reload

sudo systemctl reload dovecot.service

Make sure to port forward 465 at the firewall as well

Next Steps

Now that you’ve got the basics working, let’s secure things a little more

Sources

https://dovecot.org/list/dovecot/2019-July/116661.html

1.3.4 - Security

Certificates

We should use valid certificates. The best way to do that is with the certbot utility.

Certbot

Certbot automates the process of getting and renewing certs, and only requires a brief connection to port 80 as proof it’s you. There’s also a DNS based approach, but we use the port method for simplicity. It only runs once every 60 days so there is little risk of exploit.

Forward Port 80

You probably already have a web server and can’t just change where port 80 goes. To integrate certbot, add a name-based virtual host proxy to that web server.

# Here is a caddy example. Add this block to your Caddyfile
http://mail.your.org {
        reverse_proxy * mail.internal.lan
}

# You can also use a well-known URL if you're already using that vhost
http://mail.your.org {
   handle /.well-known/acme-challenge/ {
     reverse_proxy mail.internal.lan
   }
 }

Install Certbot

Once the port forwarding is in place, you can install certbot and request a certificate. Note the --deploy-hook argument. This reloads services after a cert is obtained or renewed. Else, they’ll keep using an expired one.

DOMAIN=your.org

sudo apt install certbot
sudo certbot certonly --standalone --domains mail.$DOMAIN --non-interactive --agree-tos -m postmaster@$DOMAIN --deploy-hook "service postfix reload; service dovecot reload"

Postfix

Tell Postfix about the cert by using the postconf utility. This will warn you about any potential configuration errors.

sudo postconf -e 'smtpd_tls_cert_file = /etc/letsencrypt/live/mail.$DOMAIN/fullchain.pem'
sudo postconf -e 'smtpd_tls_key_file = /etc/letsencrypt/live/mail.$DOMAIN/privkey.pem'
sudo postfix reload

Dovecot

Change the Dovecot to use the cert as well.

sudo sed -i 's/^ssl_cert = .*/ssl_cert = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/fullchain.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo sed -i 's/^ssl_key = .*/ssl_key = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/privkey.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo dovecot reload

Verifying

You can view the certificates with the commands:

openssl s_client -connect mail.$DOMAIN:143 -starttls imap -servername mail.$DOMAIN
openssl s_client -starttls smtp -showcerts -connect mail.$DOMAIN:587 -servername mail.$DOMAIN

Privacy and Anti-Spam

You can take advantage of Cloudflare (or other) services to accept and inspect your email before forwarding it on to you. As far as the Internet is concerned, Cloudflare is your email server. The rest is private.

Take a look at the Forwarding section, and simply forward your mail to your own server instead of Google’s. That will even allow you to remove your mail server from DNS and drop connections other than CloudFlare if desired.

Intrusion Prevention

In my testing it takes less than an hour before someone discovers and attempts to break into your mail server. You may wish to GeoIP block or otherwise limit connections. You can also use crowdsec.

Crowdsec

Crowdsec is an open-source IPS that monitors your log files and blocks suspicious behavior.

Install as per their instructions.

curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install -y crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables
sudo cscli collections install crowdsecurity/postfix

Postfix

Most services now log to the system journal rather than a file. You can view them with the journalctl command

# What is the exact service unit name?
sudo systemctl status | grep postfix

# Anything having to do with that service unit
sudo journalctl --unit [email protected]

# Zooming into just the identifiers smtp and smtpd
sudo journalctl --unit [email protected] -t postfix/smtp -t postfix/smtpd

Crowdsec accesses the system journal by adding a block to it’s log acquisition directives.

sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
  - "[email protected]"
labels:
  type: syslog
---
EOF

sudo systemctl reload crowdsec

Dovecot

Install the dovecot collection as well.

sudo cscli collections install crowdsecurity/dovecot
sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
  - "_SYSTEMD_UNIT=dovecot.service"
labels:
  type: syslog
---
EOF

sudo systemctl reload crowdsec

Is it working? You won’t see anything at first unless you’re actively under attack. But after 24 hours you may see some examples of attempts to relay spam.

allen@mail:~$ sudo cscli alerts list
╭────┬────────────────────┬────────────────────────────┬─────────┬──────────────────────────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │       value        │           reason           │ country │                      as                      │ decisions │               created_at                │
├────┼────────────────────┼────────────────────────────┼─────────┼──────────────────────────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 60 │ Ip:187.188.233.58  │ crowdsecurity/postfix-spam │ MX      │ 17072 TOTAL PLAY TELECOMUNICACIONES SA DE CV │ ban:1     │ 2023-05-24 06:33:10.568681233 +0000 UTC │
│ 54 │ Ip:177.229.147.166 │ crowdsecurity/postfix-spam │ MX      │ 13999 Mega Cable, S.A. de C.V.               │ ban:1     │ 2023-05-23 20:17:49.912754687 +0000 UTC │
│ 53 │ Ip:177.229.154.70  │ crowdsecurity/postfix-spam │ MX      │ 13999 Mega Cable, S.A. de C.V.               │ ban:1     │ 2023-05-23 20:15:27.964240044 +0000 UTC │
│ 42 │ Ip:43.156.25.237   │ crowdsecurity/postfix-spam │ SG      │ 132203 Tencent Building, Kejizhongyi Avenue  │ ban:1     │ 2023-05-23 01:15:43.87577867 +0000 UTC  │
│ 12 │ Ip:167.248.133.186 │ crowdsecurity/postfix-spam │ US      │ 398722 CENSYS-ARIN-03                        │ ban:1     │ 2023-05-20 16:03:15.418409847 +0000 UTC │
╰────┴────────────────────┴────────────────────────────┴─────────┴──────────────────────────────────────────────┴───────────┴─────────────────────────────────────────╯

If you’d like to get into the details, take a look at the Crowdsec page .

Next Steps

Now that your server is secure, let’s take a look at how email is authenticated and how to ensure yours is.

1.3.5 - Authentication

Email authentication prevents forgery. People can still send unsolicited email, but they can’t fake who it’s from. If you set up a Relay for Postfix, the relayer is doing it for you. But otherwise, proceed onward to prevent your outgoing mail being flagged as spam.

You need three things

  • SPF: Server IP addresses - which specific servers have authorization to send email.
  • DKIM: Server Secrets - email is signed so you know it’s authentic and unchanged.
  • DMARC: Verifies the address in the From: aligns with the domain sending the email, and what to do if not.

SPF

SPF, or Sender Policy Framework, is the oldest component. It’s a DNS TXT record that lists the servers authorized to send email for a domain.

A receiving server looks at a messages’s return path (aka RFC5321.MailFrom header) to see what domain the email purports to be from. It then looks up that domain’s SPF record and if the server that sent the email isn’t included, the email is considered forged.

Note - this doesn’t check the From: header the user sees. Messages can appear (to the user) to be from anywhere. So it’s is mostly a low-level check to prevent spambots.

The DNS record for your Postfix server should look like:

Type: "TXT"
NAME: "@"
Value: "v=spf1 a:mail.your.org -all"

The value above shows the list of authorized servers (a:) contains mail.your.org. Mail from all other servers is considered forged (-all).

To have your Postfix server check SPF for incoming messages add the SPF policy agent.

sudo apt install postfix-policyd-spf-python

sudo tee -a /etc/postfix/master.cf << EOF

policyd-spf  unix  -       n       n       -       0       spawn
    user=policyd-spf argv=/usr/bin/policyd-spf
EOF

sudo tee -a /etc/postfix/main.cf << EOF

policyd-spf_time_limit = 3600
smtpd_recipient_restrictions =
   permit_mynetworks,
   permit_sasl_authenticated,
   reject_unauth_destination,
   check_policy_service unix:private/policyd-spf
EOF

sudo systemctl restart postfix

DKIM

DKIM, or DomainKeys Identified Mail, signs the emails as they are sent ensuring that the email body and From: header (the one you see in your client) hasn’t been changed in transit and is vouched for by the signer.

Receiving servers see the DKIM header that includes who signed it, then use DNS to check it. Unsigned mail simply isn’t checked. (There is no could-but-didn’t in the standard).

Note - There is no connection between the domain that signs the message and what the user sees in the From: header. Messages can have a valid DKIM signature and still appear to be from anywhere. DKIM is mostly to prevent man-in-the-middle attacks from altering the message.

For Postfix, this requires installation of OpenDKIM and a connection as detailed here. Make sure to sign with the domain root.

https://tecadmin.net/setup-dkim-with-postfix-on-ubuntu-debian/

Once you’ve done that, create the following DNS entry.

Type: "TXT"
NAME: "default._domainkey"
Value: "v=DKIM1; h=sha256; k=rsa; p=MIIBIjANBgkq..."

DMARC

Having a DMARC record is the final piece that instructs servers to check the From: header the user sees against the domain return path from the SPF and DKIM checks, and what to do on a fail.

This means mail “From: [email protected]” sent though mail.your.org mail servers will be flagged as spam.

The DNS record should look like:

Type: "TXT"
NAME: "_dmarc"
Value: "v=DMARC1; p=reject; adkim=s; aspf=r;"
  • p=reject: Reject messages that fail
  • adkim=s: Use strict DKIM alignment
  • aspf=r: Use relaxed SPF alignment

Reject (p=reject) indicates that email servers should “reject” emails that fail DKIM or SPF tests, and skip quarantine.

Strict DKIM alignment (=s) means that the SPF Return-Path domain or the DKIM signing domain must be an exact match with the domain in the From: address. A DKIM signature from your.org would exactly match [email protected].

Relaxed SPF alignment (=r) means subdomains of the From: address are acceptable. I.e. the server mail.your.org from the SPF test aligns with an email from: [email protected].

You can also choose quarantine mode (p=quarantine) or report-only mode (p=none) where the email will be accepted and handled as such by the receiving server, and a report sent to you like below.

v=DMARC1; p=none; rua=mailto:[email protected]

DMARC is an or test. In the first example, if either the SPF or DKIM domains pass, then DMARC passes. You can choose to test one, both or none at all (meaning nothing can pass DMARC) as the the second DMARC example.

To implement DMARC checking in Postfix, you can install OpenDMARC and configure a mail filter as described below.

https://www.linuxbabe.com/mail-server/opendmarc-postfix-ubuntu

Next Steps

Now that you are hadnling email securely and authentically, let’s help ease client connections

Autodiscovery

1.3.6 - Autodiscovery

In most cases you don’t need this. Thunderbird, for example, will use a shotgun approach and may find your sever using ‘common’ server names based on your email address.

But there is an RFC and other clients may need help.

DNS SRV

This takes advantage of the RFC with an entry for IMAP and SMTP Submission

Type Name Service Protocol TTL Priority Weight Port Target
SRV @ _imap TCP auto 10 5 143 mail.your.org
SRV @ _submission TCP auto 10 5 465 mail.your.org

Web Autoconfig

  • Create a DNS entry for autoconfig.your.org
  • Create a vhost and web root for that with the file mail/config-v1.1.xml
  • Add the contents below to that file
<?xml version="1.0"?>
<clientConfig version="1.1">
    <emailProvider id="your.org">
      <domain>your.org</domain>
      <displayName>Example Mail</displayName>
      <displayShortName>Example</displayShortName>
      <incomingServer type="imap">
         <hostname>mail.your.org</hostname>
         <port>143</port>
         <socketType>STARTTLS</socketType>
         <username>%EMAILLOCALPART%</username>
         <authentication>password-cleartext</authentication>
      </incomingServer>
      <outgoingServer type="smtp">
         <hostname>mail.your.org</hostname>
         <port>587</port>
         <socketType>STARTTLS</socketType> 
         <username>%EMAILLOCALPART%</username> 
         <authentication>password-cleartext</authentication>
         <addThisServer>true</addThisServer>
      </outgoingServer>
    </emailProvider>
    <clientConfigUpdate url="https://www.your.org/config/mozilla.xml" />
</clientConfig>

Note

It’s traditional to match server names to protocols and we would have used “imap.your.org” and “smtp.your.org”. But using ‘mail’ is popular now and it simplifies setup at several levels.

Thunderbird will try to guess at your server names, attempting to connect to smtp.your.org for example. But many Postfix configurations have spam prevention that interfere.

Sources

https://cweiske.de/tagebuch/claws-mail-autoconfig.htm
https://www.hardill.me.uk/wordpress/2021/01/24/email-autoconfiguration/

2 - Media

2.1 - Signage

2.1.1 - Anthias (Screenly)

Overview

Anthias (AKA Screenly) is a simple, open-source digital signage system that runs well on a raspberry pi. When plugged into a monitor, it displays images, video or web sites in slideshow fashion. It’s managed directly though a web interface on the device and there are fleet and support options.

Preparation

Use the Raspberry Pi Imager to create a 64 bit Raspberry Pi OS Lite image. Select the gear icon at the bottom right to enable SSH, create a user, configure networking, and set the locale. Use SSH continue configuration.

setterm --cursor on

sudo raspi-config nonint do_change_locale en_US-UTF-8
sudo raspi-config nonint do_configure_keyboard us
sudo raspi-config nonint do_wifi_country US
sudo timedatectl set-timezone America/New_York
  
sudo raspi-config nonint do_hostname SOMENAME

sudo apt update;sudo apt upgrade -y

sudo reboot

Enable automatic updates and enable reboots

sudo apt -y install unattended-upgrades

# Remove the leading slashes from some of the updates and set to true
sudo sed -i 's/^\/\/\(.*origin=Debian.*\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-New-Unused-Dependencies \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Automatic-Reboot \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades

Installation

bash <(curl -sL https://www.screenly.io/install-ose.sh)

Operation

Adding Content

Navigate to the Web UI at the IP address of the device. You may wish to enter the settings and add authentication and change the device name.

You may add common graphic types, mp4, web and youtube links. It will let you know if it fails to download the youtube video. Some heavy web pages fail to render correctly, but most do.

Images must be sized to for the screen. In most cases this is 1080. Larger images are scaled down, but smaller images are not scaled up. For example, PowerPoint is often used to create slides, but it exports at 720. On a 1080 screen creates black boarders. You can change the resolution on the Pi with rasp-config or add a registry key to Windows to change PowerPoint’s output size.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\PowerPoint\Options]
"ExportBitmapResolution"=dword:00000096

Schedule the Screen

You may want to turn off the display during non-operation hours. The vcgencmd command can turn off video output and some displays will choose to enter power-savings mode. Some displays misbehave or ignore the command, so testing is warranted.

sudo tee /etc/cron.d/screenpower << EOF

# m h dom mon dow usercommand

# Turn monitor on
30 7  * * 1-5 root /usr/bin/vcgencmd display_power 1

# Turn monitor off
30 19 * * 1-5 root /usr/bin/vcgencmd display_power 0

# Weekly Reboot just in case
0 7 * * 1 root /sbin/shutdown -r +10 "Monday reboot in 10 minutes"
EOF

Troubleshooting

YouTube Fail

You may find you must download the video manually and then upload to Anthias. Use the utility yt-dlp to list and then download the mp4 version of a video

yt-dlp --list-formats https://www.youtube.com/watch?v=YE7VzlLtp-4
yt-dlp --format 22 https://www.youtube.com/watch?v=YE7VzlLtp-4

WiFi Disconnect

WiFi can go up and down, and some variants of the OS do not automatically reconnect. You way want to add the following script to keep connected.

sudo touch /usr/local/bin/checkwifi
sudo chmod +x /usr/local/bin/checkwifi
sudo vim.tiny /usr/local/bin/checkwifi
#!/bin/bash

# Exit if WiFi isn't configured
grep -q ssid /etc/wpa_supplicant/wpa_supplicant.conf || exit 

# In the case of multiple gateways (when connected to wired and wireless)
# the `grep -m 1` will exit on the first match, presumably the lowest metric
GATEWAY=$(ip route list | grep -m 1 default | awk '{print $3}')

ping -c4 $GATEWAY > /dev/null

if [ $? != 0 ]
then
  logger checkwifi fail `date`
  service wpa_supplicant restart
  service dhcpcd restart
else
  logger checkwifi success `date`
fi
sudo tee /etc/cron.d/checkwifi << EOF
# Check WiFi connection
*/5 * * * * /usr/bin/sudo -H /usr/local/bin/checkwifi >> /dev/null 2>&1"
EOF

Hidden WiFi

If you didn’t set up WiFi during imaging, you can use raspi-config after boot, but you must add a line if it’s a hidden network, and reboot.

sudo sed -i '/psk/a\        scan_ssid=1' /etc/wpa_supplicant/wpa_supplicant.conf

Wrong IP on Splash Screen

This seems to be captured during installation and then resides statically in this file. Adjust as needed.

# You can turn off the splash screen in the GUI or in the .conf
sed -i 's/show_splash =.*/show_splash = off/' /home/pi/.screenly/screenly.conf

# Or you can correct it in the docker file
vi ./screenly/docker-compose.yml

White Screen or Hung

Anthias works best when the graphics are the correct size. It will attempt to display images that are too large, but this flashes a white screen and eventually hangs the box (at least in the current version). Not all users get the hang of sizing things correctly, so if you have issues, try this script.

#!/bin/bash

# If this device isn't running signage, exit
[ -d /home/pi/screenly_assets ] || { echo "No screenly image asset directory, exiting"; exit 1; }

# Check that mediainfo and imagemagick convert are available
command -v mediainfo || { echo "mediainfo command not available, exiting"; exit 1; }
command -v convert  || { echo "imagemagick convert not available, exiting"; exit 1; }

cd /home/pi/screenly_assets

for FILE in *.png *.jpe *.gif
do
        # if the file doesn't exist, skip this iteration 
        [ -f $FILE ] || continue
        
        # Use mediainfo to get the dimensions at it's much faster than imagemagick              
        read -r NAME WIDTH HEIGHT <<<$(echo -n "$FILE ";mediainfo --Inform="Image;%Width% %Height%" $FILE)

        # if it's too big, use imagemagick's convert. (the mogify command doesn't resize reliably) 
        if [ "$WIDTH" -gt "1920" ] || [ "$HEIGHT" -gt "1080" ]
        then
                echo $FILE $WIDTH x $HEIGHT
                convert $FILE -resize 1920x1080 -gravity center $FILE
        fi
done

No Video After Power Outage

If the display is off when you boot the pi, it may decide there is no monitor. When someone does turn on the display, there is no output. Enable hdmi_force_hotplug in the `/boot/config.txt`` to avoid this problem, and specify the group and mode to 1080 and 30hz.

sed -i 's/.*hdmi_force_hotplug.*/hdmi_force_hotplug=1/' /boot/config.txt
sed -i 's/.*hdmi_group=.*/hdmi_group=2/' /boot/config.txt
sed -i 's/.*hdmi_mode=.*/hdmi_mode=81/' /boot/config.txt

2.1.2 - Anthias Deployment

If you do regular deployments you can create an image. A reasonable approach is to:

  • Shrink the last partition
  • Zero fill the remaining free space
  • Find the end of the last partition
  • DD that to a file
  • Use raspi-config to resize after deploying

Or you can use PiShrink to script all that.

Installation

wget https://raw.githubusercontent.com/Drewsif/PiShrink/master/pishrink.sh
chmod +x pishrink.sh
sudo mv pishrink.sh /usr/local/bin

Operation

# Capture and shrink the image
sudo dd if=/dev/mmcblk0 of=anthias-raw.img bs=1M
sudo pishrink.sh anthias-raw.img anthias.img

# Copy to a new card
sudo dd if=anthias.img of=/dev/mmcblk0 bs=1M

If you need to modify the image after creating it you can mount it via loop-back.

sudo losetup --find --partscan anthias.img
sudo mount /dev/loop0p2 /mnt/

# After you've made changes

sudo umount /mnt
sudo losetup --detach-all

Manual Steps

If you have access to a graphical desktop environment, use GParted. It will resize the filesystem and partitions for you quite easily.

# Mount the image via loopback and open it with GParted
sudo losetup --find --partscan anthias-raw.img

# Grab the right side of the last partition with your mouse and 
# drag it as far to the left as you can, apply and exit
sudo gparted /dev/loop0

Now you need to find the last sector and truncate the file after that location. Since the truncate utility operates on bytes, you convert sectors to bytes with multiplication.

# Find the End of the last partition. In the below example, it's Sector *9812664*
$ sudo fdisk -lu /dev/loop0

Units: sectors of 1 * 512 = 512 bytes

Device       Boot  Start     End Sectors  Size Id Type
/dev/loop0p1        8192  532479  524288  256M  c W95 FAT32 (LBA)
/dev/loop0p2      532480 9812664 9280185  4.4G 83 Linux


sudo losetup --detach-all

sudo truncate --size=$[(9812664+1)*512] anthias-raw.img

Very Manual Steps

If you don’t have a GUI, you can do it with a combination of commands.

# Mount the image via loopback
sudo losetup --find --partscan anthias-raw.img

# Check and resize the file system
sudo e2fsck -f /dev/loop0p2
sudo resize2fs -M /dev/loop0p2

... The filesystem on /dev/loop0p2 is now 1149741 (4k) blocks long

# Now you can find the end of the resized filesystem by:

# Finding the number of sectors.
#     Bytes = Num of blocks * block size
#     Number of sectors = Bytes / sector size
echo $[(1149741*4096)/512]

# Finding the start sector (532480 in the example below)
sudo fdisk -lu /dev/loop0

Device       Boot  Start      End  Sectors  Size Id Type
/dev/loop0p1        8192   532479   524288  256M  c W95 FAT32 (LBA)
/dev/loop0p2      532480 31116287 30583808 14.6G 83 Linux

# Adding the number of sectors to the start sector. Add 1 because you want to end AFTER the end sector
echo $[532480 + 9197928 + 1]

# And resize the part to that end sector (ignore the warnings)
sudo parted resizepart 2 9730409 

Great! Now you can follow the remainder of the GParted steps to find the new last sector and truncate the file.

Extra Credit

It’s handy to compress the image. xz is pretty good for this

xz anthias-raw.img

xzcat anthias-raw.img | sudo dd of=/dev/mmcblk0

In these procedures, we make a copy of the SD card before we do anything. Another strategy is to resize the SD card directly, and then use dd and read in X number of sectors rather than read it all in and then truncate it. A bit faster, if a but less recoverable from in the event of a mistake.

2.1.3 - API

The API docs on the web refer to screenly. Anthias uses an older API. However, you can access the API docs for the version your working with at

http://sign.your.domain/api/docs/

You’ll have to correct the swagger form with correct URL, but after that you can see what you’re working with.

3 - Monitoring

Time series vs event data.

3.1 - Metrics

3.1.1 - Prometheus

Overview

Prometheus is a time series database, meaning it’s optimized to store and work with data organized in time order. It includes in it’s single binary:

  • Database engine
  • Collector
  • Simple web-based user interface

This allows you to collect and manage data with fewer tools and less complexity than other solutions.

Data Collection

End-points normally expose metrics to Prometheus by making a web page available that it can poll. This is done by including a instrumentation library (provided by Prometheus) or simply adding a listener on a high-level port that spits out some text when asked.

For systems that don’t support Prometheus natively, there are a few add-on services to translate. These are called ’exporters’ and translate things such as SNMP into a web format Prometheus can ingest.

Alerting

You can also alert on the data collected. This is through the Alert Manager, a second package that works closely with Prometheus.

Visualization

You still need a dashboard tool like Grafana to handle visualizations, but you can get started quite quickly with just Prometheus.

3.1.1.1 - Installation

Install from the Debian Testing repo, as stable can be up to a year behind.

# Testing
echo 'deb http://deb.debian.org/debian testing main' | sudo tee -a /etc/apt/sources.list.d/testing.list

# Pin testing down to a low level so the rest of your packages don't get upgraded
sudo tee -a /etc/apt/preferences.d/not-testing << EOF
Package: *
Pin: release a=testing
Pin-Priority: 50
EOF

# Living Dangerously with test
sudo apt update
sudo apt install -t testing prometheus

Configuration

Use this for your starting config.

cat /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

This says every 15 seconds, run down the job list. And there is one job - to check out the system at ’localhost:9090’ which happens to be itself.

For every target listed, the scraper makes a web request for /metrics/ and stores the results. It ingests all the data presented and stores them for 15 days. You can choose to ignore certain elements or retain differently by adding config, but by default it takes everything given.

You can see this yourself by just asking like Prometheus would. Hit it up directly in your browser. For example, Prometheus is making metrics available at /metrics

http://some.server:9090/metrics

Operation

User Interface

You can access the Web UI at:

http://some.server:9090

At the top, select Graph (you should be there already) and in the Console tab click the dropdown labeled “insert metric at cursor”. There you will see all the data being exposed. This is mostly about the GO language it’s written in, and not super interesting. A simple Graph tab is available as well.

Data Composition

Data can be simple, like:

go_gc_duration_seconds_sum 3

Or it can be dimensional which is accomplished by adding labels. In the example below, the value of go_gc_duration_seconds has 5 labeled sub-sets.

go_gc_duration_seconds{quantile="0"} 4.5697e-05
go_gc_duration_seconds{quantile="0.25"} 7.814e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103396
go_gc_duration_seconds{quantile="0.75"} 0.000143687
go_gc_duration_seconds{quantile="1"} 0.001030941

In this example, the value of net_conntrack_dialer_conn_failed_total has several.

net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="unknown"} 0

How is this useful? It allows you to do aggregations - such as looking at all the net_contrack failures, and also look at the failures that were specifically refused. All with the same data.

Removing Data

You may have a target you want to remove. Such as a typo hostname that is now causing a large red bar on a dashboard. You can remove that mistake by enabling the admin API and issuing a delete

sudo sed -i 's/^ARGS.*/ARGS="--web.enable-admin-api"/' /etc/default/prometheus

sudo systemctl reload prometheus

curl -s -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="badhost.some.org:9100"}'

The default retention is 15 days. You may want less than that and you can configure --storage.tsdb.retention.time=1d similar to above. ALL data has the same retention, however. If you want historical data you must have a separate instance or use VictoriaMetrics.

Next Steps

Let’s get something interesting to see by adding some OS metrics

Troubleshooting

If you can’t start the prometheus server, it may be an issue with the init file. Test and Prod repos use different defaults. Add some values explicitly to get new versions running

sudo vi /etc/default/prometheus

ARGS="--config.file="/etc/prometheus/prometheus.yml  --storage.tsdb.path="/var/lib/prometheus/metrics2/"

3.1.1.2 - Node Exporter

This is a service you install on your end-points that make CPU/Memory/Etc. metrics available to Prometheus.

Installation

On each device you want to monitor, install the node exporter with this command.

sudo apt install prometheus-node-exporter

Do a quick test to make sure it’s responding to scrapes.

curl localhost:9100/metrics

Configuration

Back on your Prometheus server, add these new nodes as a job in the prometheus.yaml file. Feel free to drop the initial job where Prometheus was scraping itself.

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'servers'
    static_configs:
    - targets:
      - some.server:9100
      - some.other.server:9100
      - and.so.on:9100
sudo systemctl reload prometheus.service

Operation

You can check the status of your new targets at:

http://some.server:9090/classic/targets

A lot of data is collected by default. On some low power systems you may want less. For just the basics, customize the the config to disable the defaults and only enable specific collectors.

In the case below we are reduce collection to just CPU, Memory, and Hardware metrics. When scraping a Pi 3B, this reduces the Scrape Duration from 500 to 50ms.

sudo sed -i 's/^ARGS.*/ARGS="--collector.disable-defaults --collector.hwmon --collector.cpu --collector.meminfo"/' /etc/default/prometheus-node-exporter
sudo systemctl restart prometheus-node-exporter

The available collectors are listed on the page:

https://github.com/prometheus/node_exporter

3.1.1.3 - SNMP Exporter

SNMP is one of the most prevalent (and clunky) protocols still widely used on network-attached devices. But it’s a good general-purpose way to get data from lots of different makes of products in a similar way.

But Prometheus doesn’t understand SNMP. The solution is a translation service that acts a a middle-man and ’exports’ data from those devices in a way Prometheus can.

Installation

Assuming you’ve already installed Prometheus, install some SNMP tools and the exporter. If you have an error installing the mibs-downloader, check troubleshooting at the bottom.

sudo apt install snmp snmp-mibs-downloader
sudo apt install -t testing prometheus-snmp-exporter

Change the SNMP tools config file to allow use of installed MIBs.

sudo sed -i 's/^mibs/# &/' /etc/snmp/snmp.conf

Preparation

We need a target, so assuming you have a switch somewhere and can enable SNMP on it, let’s query the switch for its name, AKA sysName. Here we’re using version “2c” of the protocol with the read-only password “public”. Pretty standard.

# Note: app
snmpwalk -v 2c -c public some.switch.address sysName

SNMPv2-MIB::sysName.0 = STRING: Some-Switch

Note: If you get back an error or just the ‘iso’ prefixed value, double check your MIBs are installed.

Configuration

To add this switch to the Prometheus scraper, add a new job to the prometheus.yaml file. This job will include the targets as normal, but also the path (since it’s different than default) and an optional parameter called module that specific to the SNMP exporter. It also does something confusing - a relabel_config

This is because Prometheus isn’t actually taking to the switch, it’s talking to the local SNMP exporter service. So we put all the targets normally, and then at the bottom ‘oh, by the way, do a switcheroo’. This allows Prometheus to display all the data normally with no one the wiser.

...
...
scrape_configs:
  - job_name: 'snmp'
    static_configs:
      - targets:
        - some.switch.address    
    metrics_path: /snmp
    params:
      module: [if_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9116  # The SNMP exporter's real hostname:port.

Operation

No configuration on the exporter side is needed. Reload the config and check the target list. Then examine data in the graph section. Add additional targets as needed and the exporter will pull in the data.

http://some.server:9090/classic/targets

These metrics are considered well known and so will appear in the database named sysUpTime and upsBasicBatteryStatus and not be prefixed with snmp_ like you might expect.

Next Steps

If you have something non-standard, or you simply don’t want that huge amount of data in your system, look at the link below to customize the SNMP collection with the Generator.

SNMP Exporter Generator Customization

Troubleshooting

The snmp-mibs-downloader is just a handy way to download a bunch of default MIBs so when you use the tools, all the cryptic numbers, like “1.3.6.1.2.1.17.4.3.1” are translated into pleasant names.

If you can’t find the mibs-downloader its probably because it’s in the non-free repo and that’s not enabled by default. Change your apt sources file like so

sudo vi /etc/apt/sources.list

deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free

deb http://security.debian.org/debian-security bullseye-security main contrib non-free
deb-src http://security.debian.org/debian-security bullseye-security main contrib non-free

deb http://deb.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye-updates main contrib non-free

It may be that you only need to change one line.

3.1.1.4 - SNMP Generator

Installation

There is no need to install the Generator as it comes with the SNMP exporter. But if you have a device that supplies it’s own MIB (and many do), you should add that to the default location.

# Mibs are often named SOMETHING-MIB.txt
sudo cp -n *MIB.txt /usr/share/snmp/mibs/

Preparation

You must identify the values you want to capture. Using snmpwalk is a good way to see what’s available, but it helps to have a little context.

The data is arranged like a folder structure that you drill-down though. The folder names are all numeric, with ‘.’ instead of slashes. So if you wanted to get a device’s sysName you’d click down through 1.3.6.1.2.1.1.5 and look in the file 0.

When you use snmpwalk it starts wherever you tell it and then starts drilling-down, printing out everything it finds.

How do you know that’s where sysName is at? A bunch of folks got together (the ISO folks) and decided everything in advance. Then they made some handy files (MIBs) and passed them out so you didn’t have to remember all the numbers.

They allow vendors to create their own sections as well, for things that might not fit anywhere else.

A good place to start is looking at what the vendor made available. You see this by walking their section and including their MIB so you get descriptive names - only the ISO System MIB is included by default.

# The SysobjectID identifies the vendor section
# Note use of the MIB name without the .txt
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address SysobjectID

SNMPv2-MIB::sysObjectID.0 = OID: SOMEVENDOR-MIB::somevendoramerica

# Then walk the vendor section using the name from above
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c some.address somevendoramerica

SOMEVENDOR-MIB::model.0 = STRING: SOME-MODEL
SOMEVENDOR-MIB::power.0 = INTEGER: 0
...
...

# Also check out the general System section
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address system

# You can also walk the whole ISO tree. In some cases,
# there are thousands of entries and it's indeciperable
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.system iso

This can be a lot of information and you’ll need to do some homework to see what data you want to collect.

Configuration

The exporter’s default configuration file is snmp.yml and contains about 57 Thousand lines of config. It’s designed to pull data from whatever you point it at. Basically, it doesn’t know what device it’s talking to, so it tries to cover all the bases.

This isn’t a file you should edit by hand. Instead, you create instructions for the generator and it look though the MIBs and create one for you. Here’s an example for a Samlex Invertor.

vim ~/generator.yml
modules:
  samlex:
    walk:
      - sysLocation
      - inverterMode
      - power
      - vin
      - tempDD
      - tempDA
prometheus-snmp-generator generate
sudo cp /etc/prometheus/snmp.yml /etc/prometheus/snmp.yml.orig
sudo cp ~/snmp.yml /etc/prometheus
sudo systemctl reload prometheus-snmp-exporter.service

Configuration in Prometheus remains the same - but since we picked a new module name we need to adjust that.

    ...
    ...
    params:
      module: [samlex]
    ...
    ...
sudo systemctl reload prometheus.service

Adding Data Prefixes

by default, the names are all over the place. The SNMP Exporter Devs leave it this way because there are a lot of pre-built dashboards on downstream systems that expect the existing names.

If you are building your own downstream systems you can prefix (as is best-practice) as you like with a post generation step. This example cases them all to be prefixed with samlex_.

prometheus-snmp-generator generate
sed -i 's/name: /name: samlex_/' snmp.yml

Combining MIBs

You can combine multiple systems in the generator file to create one snmp.yml file, and refer to them by the module name in the Prometheus file.

modules:
  samlex:
    walk:
      - sysLocation
      - inverterMode
      - power
      - vin
      - tempDD
      - tempDA
  ubiquiti:
    walk:
      - something
      - somethingElse  

Operation

As before, you can get a preview directly from the exporter (using a link like below). This data should show up in the Web UI too.

http://some.server:9116/snmp?module=samlex&target=some.device

Sources

https://github.com/prometheus/snmp_exporter/tree/main/generator

3.2 - Logs

3.3 - Visualization

3.3.1 - Grafana

4 - Network

4.1 - DNS

Web pages today are complex. Your browser will make on average 401 DNS queries to find the various parts of an average web page, so implementing a local DNS system is key to keeping things fast.

In general, you can implement either a caching or recursive server with the choice between speed vs privacy.

Types of DNS Servers

A caching server accepts and caches queries, but doesn’t actually do the lookup itself. It forwards the request on to another DNS server and waits for the answer. If you have a lot of clients configured to use it, chances are someone else has already asked for what you want and it can supply the answer quickly from cache.

A recursive server does more than just cache answers. It knows how to connect to the root of the internet and find out itself. If you need to find some.test.com, it will connect to the .com server, ask where test.com is, then connect to test.com and ask it for some.

Comparison

Between the two, the caching server will generally be faster. If you connect to a large DNS service they will almost always have things cached. You will also get geographically relevant results as content providers work with DNS providers to direct you to the closest content cache.

With a recursive server, you do the lookup yourself and no single entity is able to monitor your DNS queries. You also aren’t dependant upon any upstream provider. But you make every lookup ’the long way’, and that can take many hundreds of milliseconds on some cases, a large part of a page load time.

Testing

In an ad hoc test on a live network with about 5,000 residential user devices, about half the queries were cached. The other half were sent to either quad 9 or a local resolver. Quad 9 took about half the time that the local resolver did.

Here are the numbers - with Steve Gibson’s DNS benchmarker against pi-hole forwarding to a local resolver vs pi-hole forwarding to quad 9. Cached results excluded.

    Forwarder     |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  - Uncached Name | 0.015 | 0.045 | 0.214 | 0.046 | 100.0 |
  - DotCom Lookup | 0.015 | 0.019 | 0.034 | 0.005 | 100.0 |
  ---<O-OO---->---+-------+-------+-------+-------+-------+

    Resolver      |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  - Uncached Name | 0.016 | 0.078 | 0.268 | 0.079 | 100.0 |
  - DotCom Lookup | 0.018 | 0.035 | 0.078 | 0.017 | 100.0 |
  ---<O-OO---->---+-------+-------+-------+-------+-------+

Selection

This test is interesting, but not definitive. While the DNS benchmark shows that the uncached average is better, page load perception is different than the sum of DNS queries. A page metric test would be good, but in general, faster is better.

Use a caching server.

One last point: use your ISP’s name server when possible. They will direct you to their local content caching systems for Netflix, Google (YouTube) and Akamai. If you use quad 9 like I did, you may get to a regional content location, but you miss out on things optimized specifically for your geographic location.

They are (probably) capturing all your queries for monetization, and possibly directing you to their own their own advertising server when you mis-key in a domain name. So you’ll need to decide;

Speed vs privacy.


  1. Informal personal checking of random popular sites. ↩︎

4.1.1 - Pi-hole

Pi-hole is reasonable choice for DNS service, especially if you don’t have a separate metrics and reporting system. A single instance will scale to 1000 active clients with just 1 core and 500M RAM and do a good job showing what’s going on.

There are some caveats when you pass 1000 users when logging all queries, but it’s a

Preparation

Prepare and secure a Debian system

Set a Static Address

sudo vi /etc/network/interfaces

Change

# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp

to

auto eth0
iface eth0 inet static
    address 192.168.0.2/24
    gateway 192.168.0.1

Secure Access with Nftables

Nftables is the modern replacement for iptables and preferred for setting netfilter rules.

sudo apt install nftables
sudo systemctl enable nftables
sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0;

                # accept any localhost traffic
                iif lo accept

                # accept already allowed and related traffic
                ct state established,related accept

                # accept DNS and DHCP traffic from internal only
                define RFC1918 = { 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12 }
                udp dport { domain, bootps } ip saddr $RFC1918 ct state new accept
                tcp dport { domain, bootps } ip saddr $RFC1918 ct state new accept

                # accept web and ssh traffic on the first interface or from an addr range
                iifname eth0 tcp dport { ssh, http } ct state new accept
                 # or 
                ip saddr 192.168.0.1/24 ct state new accept

                # Accept pings
                icmp type { echo-request } ct state new accept

                # accept neighbor discovery otherwise IPv6 connectivity breaks.
                ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit,  nd-router-advert, nd-neighbor-advert } accept

                # count other traffic that does match the above that's dropped
                counter drop
        }
}
sudo nft -f /etc/nftables.conf
sudo systemctl start nftables.service

Add Unattended Updates

This an optional, but useful service.

apt install unattended-upgrades

sudo sed -i 's/\/\/\(.*origin=Debian.*\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies\) "false";/  \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Automatic-Reboot\) "false";/  \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades

Installation

Unbound

sudo apt install unbound

Pi-hole

sudo apt install curl
curl -sSL https://install.pi-hole.net | bash

Configuration

Unbound

The pi-hole guide for [unbound]:(https://docs.pi-hole.net/guides/dns/unbound/) includes a config block to copy and paste as directed. You should also add a config file for dnsmasq while you’re at it, to set EDNS packet sizes. (dnsmasq comes as part of pi-hole)

sudo vi /etc/dnsmasq.d/99-edns.conf
edns-packet-max=1232

When you check the status of unbound, you can ignore the warning: subnetcache:... as it’s just reminding you that data in the subnet cache (if you were to use it) can’t pre-fetched. There’s some conversation1 as to why it’s warning us.

The config includes prefetch, but you may also wish to add serve-expired to the same config file from above.

# serve old responses from cache while waiting for the actual resolution to finish.
serve-expired: yes
sudo systemctl restart unbound.service

No additional setup is needed, but see the unbound page for more info.

Pi-hole

Pi-hole can be configured via it’s two main config files, /etc/pihole/setupVars.cong and pihole-FTL.conf, but it’s convenient to use the GUI’s left-hand settings menu.

  • Settings -> DNS -> Upstream DNS Servers -> Custom 1 (Check and add 127.0.0.1#5335 as shown in the unbound guide linked above)
  • Settings -> DNS -> Interface settings -> Permit all origins (needed if you have multiple networks)

Very busy pi-hole installations generate lots of data and (seemingly) hang the dashboard. If that happens, limit the about of data being displayed.

vi /etc/pihole/pihole-FTL.conf 
# Don't import the existing DB into the GUI - it will hang the web page for a long time
DBIMPORT=no

# Don't import more than an hour of logs from the logfile
MAXLOGAGE=1

# Truncate data older than this many days to keep the size of the database down
MAXDBDAYS=1
sudo systemctl restart pihole-FTL.service

Operation

Local DNS Entries

You can enter local DNS and CNAME entries via the GUI, (Admin Panel -> Local DNS), but you can also edit the config file for bulk entries.

For A records

vim /etc/pihole/custom.list
10.50.85.2 test.some.lan
10.50.85.3 test2.some.lan

For CNAME records

vim /etc/dnsmasq.d/05-pihole-custom-cname.conf
cname=test3.some.lan,test.some.lan

Block Lists

Pi-hole ships with one ad list; StevenBlack. You may need to disable this for google or facebook search results to work as expected. The top search results are often ads and don’t work as expected when pi-hole is blocking them.

  • Admin Panel -> Ad Lists -> Status Column

You might consider adding security only lists instead, such as Intel’s below

Search the web for other examples.

Upgrading

Unbound will be upgraded via the Unattended Upgrades service. But pi-hole requires a manual command.

sudo pihole -up

Troubleshooting

DNS Cache Size

The default cache size of 10,000 serves thousands clients easily. This is because entries expire faster than the cache runs out. But you can check your evictions - cache entries removed to make room before they expire - to see.

settings -> System -> DNS cache evictions:

You’ll notice that insertions keep climbing as things are added to the cache, but the cache number itself represents only those entries that are current. If you do see evictions, edit CACHE_SIZE in /etc/pihole/setupVars.conf

You can also check this at the command line

dig +short chaos txt evictions.bind @localhost

   dig +short chaos txt cachesize.bind
   dig +short chaos txt hits.bind
   dig +short chaos txt misses.bind

However, we are advised that unused cache is wasted, when it could be used for disk buffers, etc. So don’t add it just in case.

Rate Limiting

The system has a default limit of 1000 queries in a 60 seconds window for each client. If your clients are proxied or relayed, you can run into this. This event is displayed in the dashbaord2 and also in the logs3.

sudo grep -i Rate-limiting /var/log/pihole/pihole.log /var/log/pihole/pihole.log

You may find the address 127.0.0.1 being rate limited. This can be due to pi-hole doing a reverse of all client IPs every hour. You can disable this with:

# In the pihole-FTL.conf
REFRESH_HOSTNAMES=NONE

DNS over HTTP

Firefox, if the user has not yet chosen a setting, will query use-application-dns.net. Pi-hole respods with NXDOMAIN4 as a signal to use pi-hole for DNS.

/etc/pihole/pihole-FTL.conf

Apple devices include a private relay5 that the user may decide to enable if they pay for it. Pi-hole by default blocks queries for mask.icloud.com and the user will be notified you are blocking it.

# Signal that Apple iCloud Private Relay is allowed 
BLOCK_ICLOUD_PR=false
sudo systemctl reload pihole-FTL.service

Searching The Query Log Hangs DNS

On a very busy server, clicking show-all in the query log panel will hang the server as pihole-FTL works through it’s database. There is no solution, just don’t do it. The best alternative is to ship logs to a Elasticsearch or similar system.

Ask Yourself

The system continues to use whatever DNS resolver was initially configured. You may want it to use itself, instead.

# revert if pi-hole itself needs fixed.
sudo vi /etc/resolv.conf

nameserver 127.0.0.1

4.1.2 - Pi-hole DHCP

Pi-hole serves up DHCP information as well as DNS, and can be configured and enabled in the GUI.

However, the GUI only allows for a single range. On a large network you’ll need multiple ranges. You do this by editing the config files directly.

Interface-Based Ranges

In this setup, you have a separate interface per LAN. Easy to do in a virtual or VLAN environment, but you’ll have to define each in the /etc/network/interfaces file.

Let’s create a range from 192.168.0.100-200 tied to eth0 and a range of 192.168.1.100-200 tied to eth1. We’ll also specify the router and two DNS servers.

vim /etc/dnsmasq.d/05-dhcp.conf
dhcp-range=eth0,192.168.0.100,192.168.0.200,24h
dhcp-option=ens161,option:router,192.168.0.1
dhcp-option=ens161,option:dns-server,192.168.0.2,192.168.0.3

dhcp-range=eth1,192.168.1.100,192.168.1.200,24h
dhcp-option=ens161,option:router,192.168.1.1
dhcp-option=ens161,option:dns-server,192.168.1.2,192.168.1.3

# Shared by both
dhcp-option=option:netmask,255.255.0.0

# Respond immediately without waiting for other servers 
dhcp-authoritative

# Don't try and ping the address before assigning it
no-ping

dhcp-lease-max=10000
dhcp-leasefile=/etc/pihole/dhcp.leases

domain=home.lan

These settings can be implicit - i.e. we could have left out ethX in the range, but explicit is often better for clarity.

Note - the DHCP server (dnsmasq) is not enabled by default. You can do that in the GUI under Settings –> DHCP

Relay-Based Ranges

In this setup, the router relays DHCP requests to the server. Only one system network interface is required, though you must configure the router(s).

When configured, the relay (router) sets the relay-agent (giaddr) field, sends it to dnsmasq, which (I think) understands it’s a relayed request when it sees that field, and looks at it’s available ranges for a match. It’s also sets a tag that to be used for assigning different options, such as the gateway, per range.

dhcp-range=tag0,192.168.0.100,192.168.0.250,255.255.255.0,8h
dhcp-range=tag1,192.168.1.100,192.168.1.250,255.255.255.0,8h
dhcp-range=tag2,192.168.2.100,192.168.2.250,255.255.255.0,8h

dhcp-options=tag0,3,192.168.0.1
dhcp-options=tag1,3,192.168.1.1
dhcp-options=tag2,3,192.168.2.1

Sources

https://discourse.pi-hole.net/t/more-than-one-conditional-forwarding-entry-in-the-gui/11359

Troubleshooting

It’s possible that the DHCP part of dnsmasq doesn’t scale to many thousands of leases1

4.1.3 - Unbound

4.2 - VPN

4.2.1 - Wireguard

Wireguard is a new, light-weight VPN that is both faster and simpler than its predecessors. With a small code-base and modern cryptography, it’s the future of VPNs.

Concepts

Wireguard is a layer 3 VPN and as such, only works with IPv4/6. It doesn’t provide DHCP, bridging, or other low-level features.

Participants authenticate using public-key cryptography, use UDP as a transport and do not respond to unauthenticated connection attempts.

Every participant is considered a peer. Each defines their own IP address, routing rules, and decides from whom they will accept traffic. Every peer must exchange public keys with every other other peer. There is no central authority.

Traffic is sent directly between configured peers but can also be relayed through central nodes if so configured by routing rules on the participants.

Scenarios

The way you deploy depends on what you’re doing, but in general you’ll either connect directly point-to-point or create a central server for remote access or management.

Central Server and Remote Access

This is the classic setup where remote systems connect to the network through one central point. Configure a wireguard server as that central point and then your clients (remote peers) to connect.

Central Server and Remote Management

Another common use is to have a fleet of devices ‘phone-home’ so you can reach them easily.

Point to Point

You can also have peers talk directly to each other. This is often used with routers to connect networks across the internet.

4.2.1.1 - Central Server

A central server gives remote devices a reachable target, allowing them to traverse firewalls and NAT and connect. Let’s create a server and generate and add your first remote peer.

Preparation

You’ll need:

  • Public Domain Name or Static IP
  • Linux Server
  • Ability to port-forward UDP 51830

A dynamic domain name will work and it’s reasonably priced (usually free). You just need something for the peers to connect to, though a static IP is best. You can possibly break connectivity if your IP changes while your peers are connected or have the old IP cached.

We use Debian in this example and derivatives should be similar. UDP 51820 is the standard port but you can choose another if desired.

You must also choose a VPN network that doesn’t overlap with your existing networks. We use 192.168.100.0/24 in this example.

Installation

sudo apt install wireguard-tools

Configuration

All the server needs is a single config file and it will look something like this:

[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = sGp9lWqfBx+uOZO8V5NPUlHQ4pwbvebg8xnfOgR00Gw=

We picked .1 as our server address (pretty standard), created a private key with the wg tool, and put that in the file /etc/wireguard/wg0.conf. Here’s the commands to do that.

# As root
cd /etc/wireguard/
umask 077

wg genkey > server_privatekey
wg pubkey < server_privatekey > server_publickey

read PRIV < server_privatekey

cat << EOF > wg0.conf
[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = $PRIV
EOF

Operation

The VPN operates by creating network interface and loading a kernel module. You use the linux ip command to add a network interface of type wireguard (that automatically loads the kernel module) or use the wg-quick command do do it for you. Name the interface wg0 and it will pull in the config wg0.conf

Test the Interface

wg-quick up wg0

ping 192.168.100.1

wg-quick down wg0

Enable The Service

For normal use, employ systemctl to create a service using the installed service file.

systemctl enable --now wg-quick@wg0

Administration

The most common procedure is adding new clients. Each must have a unique key and IP, as the keys are hashed and used as part of the internal routing.

Create a Client

Let’s create a client config file by generating a key and assigning them an IP. It’s not secure, but it is pragmatic.

wg genkey > client_privatekey # Generates and saves the client private key
wg pubkey < client_privatekey # Displays the client's public key

Add the client’s public key and IP to your server’s wg0.conf and reload. For the IP, it’s fine to just increment. Note the /32, meaning we will only accept that IP from this peer.

[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = XXXXXX

##  Some Client  ##
[Peer]
PublicKey = XXXXXX
AllowedIPs = 192.168.100.2/32
wg-quick down wg0 &&  wg-quick up wg0

Send The Client Config

A client config file should look similar to this. The [Interface] is about the client and the [Peer] is about the server.

[Interface]
PrivateKey = THE-CLIENT-PRIVATE-KEY
Address = 192.168.100.2/32

[Peer]
PublicKey = YOUR-SERVERS-PUBLIC-KEY
AllowedIPs = 192.168.100.0/24
Endpoint = your.server.org:51820

Put in the keys and domain name, zip it up and send it on to your client as securely as possible. One neat trick is to display a QR code right in the shell. Devices that have a camera can import from that.

qrencode -t ANSIUTF8 < client-wg0.conf

Test The Client

You should be able to ping the server from the client. If not, take a look at the troubleshooting steps.

Next Steps

We haven’t enabled forwarding yet or set up firewall rules as those depend on what role your central peer will play. Proceed on to Remote Access or Remote Management as desired.

Troubleshooting

When something is wrong, you don’t get an error message, you just get nothing. You bring up the client interface but you can’t ping the server 192.168.100.1. But you can turn on log messages on the server with this command.

echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg

# When done, send a '-p'

Key Errors

wg0: Invalid handshake initiation from 205.133.134.15:18595

In this case, you should check your keys and possibly take the server interface down and up.

Typeos

ifconfig: ioctl 0x8913 failed: No such device

Check your conf is named /etc/wireguard/wg0.conf and look for any typoes.

Firewall Issues

If you see no wireguard error messages, you should suspect your firewall. Since it’s UDP you can’t test the port directly, but you can use netcat.

nc -ulp 51820  # On the server

nc -u some.server 51820 # On the client. Type and see if it shows up on the server

4.2.1.2 - Remote Access

This is the classic setup where remote peers initiate a connection to the central peer through the internet. That central system forwards their traffic onward to the corporate network.

Traffic Handling

The main choice is route or masquerade .

Routing

If you route, the client’s VPN IP address is what other devices see. This is generally preferred as it allows you to log who was doing what at the individual servers. But you must update your network equipment to treat the central server as a router.

Masquerading

Masquerading causes the server to translate all the traffic. This makes everything look like its coming from the server. It’s less secure, but less complicated and much quicker to implement.

For this example, we will masquerade traffic from the server.

Central Server Config

Enable Masquerade

Use sysctl to enable forwarding on the server and nft to add masquerade.

# as root
sysctl -w net.ipv4.ip_forward=1

nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade

Persist Changes

It’s best if we add our new rules onto the defaults and enable the nftables service.

# as root
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

nft list ruleset >> /etc/nftables.conf

systemctl enable --now  nftables.service 

Client Config

Your remote peer - the one you created when setting up the server - needs it’s AllowedIPs adjusted so it knows to send more traffic through the tunnel.

Full Tunnel

This sends all traffic from the client over the VPN.

AllowedIPs = 0.0.0.0/0

Split Tunnel

The most common config is to send specific networks through the tunnel. This keeps netflix and such off the VPN

AllowedIPs = 192.168.100.0/24, 192.168.XXX.XXX, 192.168.XXX.YYY

DNS

In some cases, you’ll need the client to use your internal DNS server to resolve private domain names. Make sure this server is in the AllowedIPs above.

[Interface]
PrivateKey = ....
Address = ...
DNS = 192.168.100.1

Access Control

Limit Peer Access

By default, everything is open and all the peers can talk to each other and the internet at large - even NetFlix! (they can edit their side of the connection at will). So let’s add some rules to the default filter table.

This example prevents peers from from talking to each other but let’s them ping the central server and reach the corporate network.

# Load the base config in case you haven't arleady. This includes the filter table
sudo nft -f /etc/nftables.conf

# Reject any traffic being sent outside the 192.168.100.0/24
sudo nft add rule inet filter forward iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp type admin-prohibited

# Reject any traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

Grant Admin Access

You may want to add an exception for one of the addresses so that an administrator can interact with the remote peers. Order matters, so add it before before the other rules above

sudo nft -f /etc/nftables.conf

# Allow an special 'admin' peer full access and others to reply
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept

# As above
...
...

Save Changes

Since this change is a little more complex, we’ll replace the existing file config file and add notes.

sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0
        }
        chain forward {
                type filter hook forward priority 0

                # Accept admin traffic and responses
                iifname "wg0" ip saddr 192.168.100.2 accept
                iifname "wg0" ct state {established, related} accept

                # Reject other traffic between peers
                iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

                # Reject traffic outside the desired network
                iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp admin-prohibited
        }
        chain output {
                type filter hook output priority 0
        }
}
table ip nat {
        chain postrouting {
                type nat hook postrouting priority srcnat
                masquerade
        }
}

Note: The syntax of the file is slightly different than the command. You can use nft list ruleset to see how nft config and commands translate into running rules. For example - the policy accept is being appended. You may want to experiment with explicitly adding policy drop.

The forwarding chain is where routing type rules go (the input chain is traffic sent to the host itself). Prerouting might work as well, though it’s less common and not present by default.

Notes

The default nftable config file in Debian is:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
	chain input {
		type filter hook input priority filter;
	}
	chain forward {
		type filter hook forward priority filter;
	}
	chain output {
		type filter hook output priority filter;
	}
}

If you have old iptables rules you want to translate to nft, you can install iptables and add them (they get translated on the fly into nft) and nft list ruleset to see how to they turn out.

4.2.1.3 - Remote Mgmt

In this scenario peers initiate connections to the cerntral server, making their way through NAT and Firewalls, but you don’t want to forward their traffic.

Central Server Config

No forwarding or masquerade is desired, so there is no additional configuration to the central server.

Client Config

The remote peer - the one you created when setting up the server - is already set up with one exception; a keep-alive.

When the remote peer establishes it’s connection to the central server, intervening firewalls allow you to talk back as they assume it’s in response. However, the firewall will eventually ‘close’ this window unless the client continues sending traffic occasionally to ‘keep alive’ the connection.

# Add this to the bottom of your client's conf file
PersistentKeepalive = 20

Firewall Rules

You should apply some controls to your clients to prevent them from talking to each other (and possibly the server and you also need a rule for the admin station. You can do this by adding rules to the forward chain.

# Allow an 'admin' peer at .2 full access to others and accept their replies
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept
# Reject any other traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

You can persist this change by editing your /etc/nftables.conf file to look like this.

sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0;
        }
        chain forward {
                type filter hook forward priority 0;

                # Accept admin traffic
                iifname "wg0" ip saddr 192.168.100.2 accept
                iifname "wg0" ct state {established, related} accept

                # Reject other traffic between peers
                iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
        }
        chain output {
                type filter hook output priority 0;
        }
}
table ip nat {
        chain postrouting {
                type nat hook postrouting priority srcnat; policy accept;
                masquerade
        }
}

4.2.1.4 - Routing

Rather than masquerade, your wireguard server can forward traffic with the VPN addresses intact. You must handle that on your network in one of the following ways.

Symmetric Routing

Classically, you’d treat the wireguard server like any other router. You’d create a management interface and/or a routing interface and advertise routes appropriately.

On a small network, you would simply overlay an additional IP range on top of the existing on by adding a second IP address on your router and put your wireguard server on that network. Your local servers will see the VPN addressed clients and send traffic to the router that will pass it to the wireguard server.

Asymmetric Routing

In a small network you might have the central peer on the same network as the other servers. In this case, it will be acting like a router and forwarding traffic, but the other servers won’t know about it and so will send replies back to their default gateway.

To remedy this, add a static route at the gateway for the VPN range that sends traffic back to the central peer. Asymmetry is generally frowned upon, but it gets the job done with one less hop.

Host Static Routing

You can also configure the servers in question with a static route for VPN traffic so they know to send it directly back to the Wireguard server. This is fastest but you have to visit every host. Though you can use DHCP to distribute this route in some cases.

4.2.1.5 - LibreELEC

LibreELEC and CoreELEC are Linux-based open source software appliances for running the Kodi media player. These can be used as kiosk displays and you can remotely manage them with wireguard.

Create a Wireguard Service

These systems have wireguard support, but use connman that lacks split-tunnel ability1. This forces all traffic through the VPN and so is unsuitable for remote management. To enable split-tunnel, create a wireguard service instead.

Create a service unit file

vi /storage/.config/system.d/wg0.service
[Unit]
Description=start wireguard interface

# The network-online service isn't guaranteed to work on *ELEC
#Requires=network-online.service

After=time-sync.target
Before=kodi.service

[Service]
Type=oneshot
RemainAfterExit=true
StandardOutput=journal

# Need to check DNS is responding before we proceed
ExecStartPre=/bin/bash -c 'until nslookup google.com; do sleep 1; done'

ExecStart=ip link add dev wg0 type wireguard
ExecStart=ip address add dev wg0 10.1.1.3/24
ExecStart=wg setconf wg0 /storage/.config/wireguard/wg0.conf
ExecStart=ip link set up dev wg0
# On the newest version, a manual route addition is needed too
ExecStart=ip route add 10.2.2.0/24 dev wg0 scope link src 10.1.1.3

# Deleting the device seems to remove the address and routes
ExecStop=ip link del dev wg0

[Install]
WantedBy=multi-user.target

Create a Wireguard Config File

Note: This isn’t exactly the same file wg-quick uses, just close enough to confuse.

vi /storage/.config/wireguard/wg0.conf
[Interface]
PrivateKey = XXXXXXXXXXXXXXX

[Peer]
PublicKey = XXXXXXXXXXXXXXX
AllowedIPs = 10.1.1.0/24
Endpoint = endpoint.hostname:31194
PersistentKeepalive = 25

Enable and Test

systemctl enable --now wg0.service
ping 10.1.1.1

Create a Cron Check

When using a DNS name for the endpoint you may become disconnected. To catch this, use a cron job

# Use the internal wireguard IP address of the peer you are connecting to. .1 in this case
crontab -e
*/5 * * * * ping -c1 -W5 10.1.1.1 || ( systemctl stop wg0; sleep 5; systemctl start wg0 )

4.2.1.6 - TrueNAS Scale

You can directly bring up a Wireguard interface in TrueNAS Scale, and use that to remotely manage it.

Wireguard isn’t exposed in the GUI, so use the command line to create a config file and enable the service. To make it persistent between upgrades, add a cronjob to restore the config.

Configuration

Add a basic peer as when setting up a Central Server and save the file on the client as /etc/wireguard/wg1.conf. It’s rumored that wg0 is reserved for the TrueNAS cloud service. Once the config is in place, use wg-quick up wg1 command to test and enable as below.

nano /etc/wireguard/wg1.conf

systemctl enable --now wg-quick@wg1

If you use a domain name in this conf for the other side, this service will fail at boot because DNS isn’t up and it’s not easy to get it to wait. So add a pre-start to the service file to specifically test name resolution.

vi /lib/systemd/system/[email protected]

[Service] 
...
...
ExecStartPre=/bin/bash -c 'until host google.com; do sleep 1; done'

Note: Don’t include a DNS server in your wireguard settings or everything on the NAS will attempt to use your remote DNS and fail if the link goes down.

Accessing Apps

When deploying an app, click the enable “Host Network” or “Configure Host Network” box in the apps config and you should be able to access via the VPN address. On Cobia (23.10) at least. If that fails, you can add a command like this to a post-start in the wireguard config file.

iptables -t nat -A PREROUTING --dst 192.168.100.2 -p tcp --dport 20910 -j DNAT --to-destination ACTUAL.LAN.IP:20910

Detecting IP Changes

The other side of your connection may dynamic address and wireguard wont know about it. A simple solution is a cron job that pings the other side periodically, and if it fails, restarts the interface. This will lookup the domain name again and hopefully find the new address.

touch /etc/cron.hourly/wg_test
chmod +x /etc/cron.hourly/wg_test
vi /etc/cron.hourly/wg_test

#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( wg-quick down wg1 ; wg-quick up wg1 )

Troubleshooting

Cronjob Fails

cronjob kills interface when it can’t ping

or

/usr/local/bin/wg-quick: line 32: resolvconf: command not found

Calling wg-quick via cron causes a resolvconf issue, even though it works at the command line. One solution is to remove any DNS config from your wg conf file so it doesn’t try to register the remote DNS server.

Nov 08 08:23:59 truenas wg-quick[2668]: Name or service not known: `some.server.org:port' Nov 08 08:23:59 truenas wg-quick[2668]: Configuration parsing error … Nov 08 08:23:59 truenas systemd[1]: Failed to start WireGuard via wg-quick(8) for wg1.

The DNS service isn’t available (yet), despite Requires=network-online.target nss-lookup.target already in the service unit file. One way to solve this is a pre-exec in the Service section of the unit file1. This is hacky, but none of the normal directives work.

The cron job above will bring the service up eventually, but it’s nice to have it at boot.

Upgrade Kills Connection

An upgrade comes with a new OS image and that replaces anything you’ve added, such as wireguard config and cronjobs. The only way to persist your Wireguard connection it to put a script on the pool and add a cronjob via the official interface2.

Add this script and change for your pool location. This is set to run every 5 min, as you probably don’t want to wait after an upgrade very long to see if it’s working. You can also use this to detect IP changes over the cron.hourly above.

# Create the location and prepare the files
mkdir /mnt/pool02/bin/
cp /etc/wireguard/wg1.conf /mnt/pool02/bin/
touch /mnt/pool02/bin/wg_test
chmod +x /mnt/pool02/bin/wg_test

# Edit the script 
vi /mnt/pool02/bin/wg_test

#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( cp /mnt/pool02/bin/wg1.conf /etc/wireguard/ ; wg-quick down wg1 ; wg-quick up wg1 )


# Invoke the TrueNAS CLI and add the job
cli
task cron_job create command="/mnt/pool02/bin/wg_test" enabled=true description="test" user=root schedule={"minute": "*/5", "hour": "*", "dom": "*", "month": "*", "dow": "*"}

Notes

https://www.truenas.com/docs/core/coretutorials/network/wireguard/ https://www.truenas.com/community/threads/no-internet-connection-with-wireguard-on-truenas-scale-21-06-beta-1.94843/#post-693601

4.2.1.7 - Proxmox

Proxmox is frequently used in smaller enironments for it’s ability to mix Linux Containers and Virtual Machines at very low cost. LCD - Linux Containers - are especially valuable as they give the bennifits of virtualization with minimal overhead.

Using wireguard in a container simply requires adding the host’s kernel module interface.

Edit the container’s config

On the pve host, for lxc id 101:

echo "lxc.mount.entry = /dev/net/tun /dev/net/tun none bind create=file" >> /etc/pve/lxc/101.conf

Older Proxmox

In the past you had to install the module, or use the DKMS method. That’s no longer needed as the Wireguard kernel module is now available on proxmox with the standard install. You don’t even need to install the wireguard tools. But if you run into trouble you can go through these steps

apt install wireguard
modprobe wireguard

# The module will load dynamically when a conainter starts, but you can also manually load it
echo "wireguard" >> /etc/modules-load.d/modules.conf

5 - Security

5.1 - CrowdSec

5.1.1 - Installation

Overview

CrowdSec has two main parts; detection and interdiction.

Detection is handled by the main CrowdSec binary. You tell it what files to keep an eye on, how to parse those files, and what something ‘bad’ looks like. It then keeps a list of IPs that have done bad things.

Interdiction is handled by any number of plugins called ‘bouncers’, so named because they block access or kick out bad IPs. They run independently and keep an eye on the list, to do things like edit the firewall to block access for a bad IP.

There is also the ‘crowd’ part. The CrowdSec binary downloads IPs of known bad-actors from the cloud for your bouncers to keep out and submits alerts from your systems.

Installation

With Debian, you can simply add the repo via their script and install with a couple lines.

curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables

This installs both the detection (crowdsec) and the interdiction (crowdsec-firewall-bouncer) parts. Assuming eveything went well, crowdsec will check in with the cloud, download a baseline list of known bad-actors, the firewall-bouncer will set up a basic drop list in the firewall, and crowdsec will start watching your syslog for intrusion attempts.

# Check out the very long drop list
sudo nft list ruleset | less

Configuration

CrowdSec comes pre-configured to watch for ssh brute-force attacks. If you have specific services to watch you can add those as described below.

Add a Service

You probably want to watch a specific service, like web server. Take a look at [https://hub.crowdsec.net/] to see all the available components. For example, browse the collections and search for caddy. The more info link will show you how to install the collection;

sudo cscli collections list -a
sudo cscli collections install crowdsecurity/caddy

Tell CrowdSec where Caddy’s log files are.

sudo tee -a /etc/crowdsec/acquis.yaml << EOF

---
filenames:
 - /var/log/caddy/*.log
labels:
  type: caddy
---
EOF

Restart crowdsec for these changes to take effect

sudo systemctl reload crowdsec

Operation

DataFlow

CrowdSec works by pulling in data from the Acquisition files, Parsing the events, comparing to Scenarios, and then Deciding if action should be taken.

Acquisition of data from log files is based on entries in the acquis.yaml file, and the events given a label as defined in that file.

Those events feed the Parsers. There are a handful by default, but only the ones specifically interested in a given label will see it. They look for keywords like ‘FAILED LOGIN’ and then extract the IP.

Successfully parsed lines are feed to the Scenarios to if what happened matters. The scenarios look for things like 10 FAILED LOGINs in 1 min. This separates the accidental bad password entry from a brute force attempt.

Matching a scenario gets the IP added to the Decision List, i.e the list of bad IPs. These have a configurable expiration, so that if you really guess wrong 10 times in a row, you’re not banned forever.

The bouncers use this list to take action, like a firewall block, and will unblock you after the expiration.

Collections

Parsers and Scenarios work best when they work together so they are usually distributed together as a Collection. You can have collections of collections as well. For example, the base installation comes with the linux collection that includes a few parsers and the sshd collection.

To see what Collections, Parsers and Scenarios are running, use the cscli command line interface.

sudo cscli collections list
sudo cscli collections inspect crowdsecurity/linux
sudo cscli collections inspect crowdsecurity/sshd

Inspecting the collection will tell you what parsers and scenarios it contains. As well as some metrics. To learn more a collection and it’s components, you can check out their page:

https://hub.crowdsec.net/author/crowdsecurity/collections/linux

The metrics are a bit confusing until you learn that the ‘Unparsed’ column doesn’t mean unparsed so much as it means a non-event. These are just normal logfile lines that don’t have one of the keywords the parser was looking for, like ‘LOGIN FAIL’.

Status

Is anyone currently attacking you? The decisions list shows you any current bad actors and the alerts list shows you a summary of past decisions. If you are just getting started this is probably none, but if you’re open to the internet this will grow quickly.

sudo cscli decisions list
sudo cscli alerts list

But you are getting events from the cloud and you can check those with the -a option. You’ll notice that every 2 hours the community-blocklist is updated.

sudo cscli alerts list -a

After a while of this collection running, you’ll start to see these kinds of alerts

sudo cscli alerts list
╭────┬───────────────────┬───────────────────────────────────────────┬─────────┬────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │       value       │                  reason                   │ country │           as           │ decisions │               created_at                │
├────┼───────────────────┼───────────────────────────────────────────┼─────────┼────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 27 │ Ip:18.220.128.229 │ crowdsecurity/http-bad-user-agent         │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.948429492 +0000 UTC │
│ 26 │ Ip:18.220.128.229 │ crowdsecurity/http-path-traversal-probing │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.979479713 +0000 UTC │
│ 25 │ Ip:18.220.128.229 │ crowdsecurity/http-probing                │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.9460075 +0000 UTC   │
│ 24 │ Ip:18.220.128.229 │ crowdsecurity/http-sensitive-files        │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.945759433 +0000 UTC │
│ 16 │ Ip:159.223.78.147 │ crowdsecurity/http-probing                │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2023-03-01 23:03:06.818512212 +0000 UTC │
│ 15 │ Ip:159.223.78.147 │ crowdsecurity/http-sensitive-files        │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2023-03-01 23:03:05.814690037 +0000 UTC │
╰────┴───────────────────┴───────────────────────────────────────────┴─────────┴────────────────────────┴───────────┴─────────────────────────────────────────╯

You may even need to unblock yourself

sudo cscli decisions list
sudo cscli decision delete --id XXXXXXX

Next Steps

You’re now taking advantage of the crowd-part of the crowdsec and added your own service. If you don’t have any alerts though, you may be wondering how well it’s actually working.

Take a look at the detailed activity if you want to look more closely at what’s going on.

5.1.2 - Detailed Activity

Inspecting Metrics

Data comes in through the parsers. To see what they are doing, let’s take a look at the Acquisition and Parser metrics.

sudo cscli metrics

Most of the ‘Acquisition Metrics’ lines will be read and unparsed. This is because normal events are dropped. It only considers lines parsed if they were passed on to a scenario. The ‘bucket’ column refers to event scenarios and is also blank as there were no parsed lines to hand off.

Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│         Source         │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 216        │ -            │ 216            │ -                      │
│ file:/var/log/syslog   │ 143        │ -            │ 143            │ -                      │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

The ‘Parser Metrics’ will show the individual parsers - but not all of them. Only parsers that have at least one ‘hit’ are shown. In this example, only the syslog parser shows up. It’s a low-level parser that doesn’t look for matches, so every line is a hit.

Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│             Parsers             │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/syslog-logs │ 359  │ 359    │ -        │
│ crowdsecurity/syslog-logs       │ 359  │ 359    │ -        │
╰─────────────────────────────────┴──────┴────────┴──────────╯

However, try a couple failed SSH login attemps and you’ll see them and how they feed up the the Acquistion Metrics.


Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│         Source         │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 242        │ 3            │ 239            │ -                      │
│ file:/var/log/syslog   │ 195        │ -            │ 195            │ -                      │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│             Parsers             │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/sshd-logs   │ 61   │ 3      │ 58       │
│ child-crowdsecurity/syslog-logs │ 442  │ 442    │ -        │
│ crowdsecurity/dateparse-enrich  │ 3    │ 3      │ -        │
│ crowdsecurity/geoip-enrich      │ 3    │ 3      │ -        │
│ crowdsecurity/sshd-logs         │ 8    │ 3      │ 5        │
│ crowdsecurity/syslog-logs       │ 442  │ 442    │ -        │
│ crowdsecurity/whitelists        │ 3    │ 3      │ -        │
╰─────────────────────────────────┴──────┴────────┴──────────╯

Lines poured to bucket however, is still empty. That means the scenaros decided it wasn’t a hack attempt. With SSH timeouts it actually hard to do without a tool. Plus, you may notice the ‘whitelist` was triggered. Private IP ranges are whilelisted by default so you can’t lock yourself out from inside.

Let’s ask crowdsec to explain what’s going on

Detailed Parsing

To see which parsers got involved and what they did, you can ask.

sudo cscli explain --file /var/log/auth.log --type syslog

Here’s a ssh example of a failed login. The numbers, such as (+9 ~1), mean that the parser added 9 elements it parsed from the raw event, and updated 1. Notice the whitelists parser at the end. It’s catching this event and dropping it, hense the ‘parser failure’

line: Mar  1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16  user=allen
        ├ s00-raw
        |       └ 🟢 crowdsecurity/syslog-logs (first_parser)
        ├ s01-parse
        |       └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
        ├ s02-enrich
        |       ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
        |       ├ 🟢 crowdsecurity/geoip-enrich (+9)
        |       └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
        └-------- parser failure 🔴

Why exactly did it get whitelisted? Let’s ask for a verbose report.

sudo cscli explain -v --file /var/log/auth.log --type syslog
line: Mar  1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16  user=someGuy
        ├ s00-raw
        |       └ 🟢 crowdsecurity/syslog-logs (first_parser)
        ├ s01-parse
        |       └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
        |               └ update evt.Stage : s01-parse -> s02-enrich
        |               └ create evt.Parsed.sshd_client_ip : 192.168.1.16
        |               └ create evt.Parsed.uid : 0
        |               └ create evt.Parsed.euid : 0
        |               └ create evt.Parsed.pam_type : unix
        |               └ create evt.Parsed.sshd_invalid_user : someGuy
        |               └ create evt.Meta.service : ssh
        |               └ create evt.Meta.source_ip : 192.168.1.16
        |               └ create evt.Meta.target_user : someGuy
        |               └ create evt.Meta.log_type : ssh_failed-auth
        ├ s02-enrich
        |       ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
        |               ├ create evt.Enriched.MarshaledTime : 2023-03-01T14:08:11Z
        |               ├ update evt.MarshaledTime :  -> 2023-03-01T14:08:11Z
        |               ├ create evt.Meta.timestamp : 2023-03-01T14:08:11Z
        |       ├ 🟢 crowdsecurity/geoip-enrich (+9)
        |               ├ create evt.Enriched.Longitude : 0.000000
        |               ├ create evt.Enriched.ASNNumber : 0
        |               ├ create evt.Enriched.ASNOrg : 
        |               ├ create evt.Enriched.ASNumber : 0
        |               ├ create evt.Enriched.IsInEU : false
        |               ├ create evt.Enriched.IsoCode : 
        |               ├ create evt.Enriched.Latitude : 0.000000
        |               ├ create evt.Meta.IsInEU : false
        |               ├ create evt.Meta.ASNNumber : 0
        |       └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
        |               └ update evt.Whitelisted : %!s(bool=false) -> true
        |               └ update evt.WhitelistReason :  -> private ipv4/ipv6 ip/ranges
        └-------- parser failure 🔴

This shows the actual data and at the bottom, parser crowdsecurity/whitelists has updated the property ’evt.Whitelisted’ to true and gave it a reason. That property appears to be a built-in that flags events to be dropped.

If you want to change the ranges, you can edit the logic by editing the yaml file. A sudo cscli hub list will show you what file that is. Add or remove entries from the list it’s checking the ‘ip’ valie and ‘cidr’ value against. Any match cases whitelist to become true.

False Positives

You may see a high percent of ‘Lines poured to bucket’ relative to ‘Lines read’, like in this example where almost all are. Some lines triggering two scenareos when the ‘bucket’ is greater than the number of ‘parsed’

Acquisition Metrics:
╭────────────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│             Source             │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log         │ 69         │ -            │ 69             │ -                      │
│ file:/var/log/caddy/access.log │ 2121           │ -              │ 32│ file:/var/log/syslog           │ 2          │ -            │ 2              │ -                      │
╰────────────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

Sometimes, that’s OK as not all scenarios are designed to take instant action. The ‘http-crawl-non_statics’ had 17 events and was considering action against 2 IPs, but never ‘Overflowed’ aka took action.

The http-probing did, however. And one of the two IPs had action take against them

Bucket Metrics:
╭──────────────────────────────────────┬───────────────┬───────────┬──────────────┬────────┬─────────╮
│                Bucket                │ Current Count │ Overflows │ Instantiated │ Poured │ Expired │
├──────────────────────────────────────┼───────────────┼───────────┼──────────────┼────────┼─────────┤
│ crowdsecurity/http-crawl-non_statics │ -             │ -         │ 2            │ 17     │ 2       │
│ crowdsecurity/http-probing           │ -             │ 1         │ 2            │ 15     │ 1       │
╰──────────────────────────────────────┴───────────────┴───────────┴──────────────┴────────┴─────────╯

You can ask crowdsec to explain what’s going on with a -v and see that clients are asking for things that don’t exist.

  ├ s00-raw
  | ├ 🟢 crowdsecurity/non-syslog (first_parser)
  | └ 🔴 crowdsecurity/syslog-logs
  ├ s01-parse
  | └ 🟢 crowdsecurity/caddy-logs (+19 ~2)
  |   └ update evt.Stage : s01-parse -> s02-enrich
  |   └ create evt.Parsed.request : /0/icon/Forman,%20M.L.%20
  |   ...
  |   └ create evt.Meta.http_status : 404
  |   ...
  ├-------- parser success 🟢
  ├ Scenarios
    ├ 🟢 crowdsecurity/http-crawl-non_statics
    └ 🟢 crowdsecurity/http-probing

If you look at the rules (sudo cscli hub list) for http-probing, you’ll see it looks for 404s (file not found). If you get more than 10 in 10 seconds, it ‘overflows’ and the IP get baned.

Whitelist

The trouble is, some web apps generate a lot of 404s as they try and load page elements in case they exist. This generates lots of 404s and bans. In this case, we must whitelist the application with an expression that checks to see if it was an icon request, like above.

sudo vi /etc/crowdsec/parsers/s02-enrich/some-app-whitelist.yaml
name: crowdsecurity/whitelists 
description: "Whitelist 404s for icon requests" 
whitelist: 
  reason: "icon request" 
  expression:   
    - evt.Parsed.request startsWith '/0/icon/'

5.1.3 - Custom Parser

When checking out the detailed metrics you may find that log entries aren’t being parsed. Maybe the log format has changed or you’re logging additional data the author didn’t anticipate. The best thing is to add your own parser.

Types of Parsers

There are several type of parsers and they are used in stages. Some are designed to work with the raw log entries while others are designed to take pre-parsed data and add or enrich it. This way you can do branching and not every parser needs to now how to read a syslog message.

Their Local Path will tell you what stage they kick in at. Use sudo cscli parsers list to display the details. s00-raw works with the ‘raw’ files while s01 and s02 work further down the pipeline. Currently, you can only create s00 and s01 level parsers.

Integrating with Scenarios

Useful parsers supply data that Scenarios are interested in. You can create a parser that watches the system logs for ‘FOOBAR’ entries, extracts the ‘FOOBAR-LEVEL`, and passes it on. But if nothing is looking for ‘FOOBARs’ then nothing will happen.

Let’s say you’ve added the Caddy collection. It’s pulled in a bunch of Scenarios you can view with sudo cscli scenarios list. If you look at one of the assicated files you’ll see a filter section where they look for ’evt.Meta.http_path’ and ’evt.Parsed.verb’. They are all different though, so how do you know what data to supply?

Your best bet is to take an existing parser and modify it.

Examples

Note - CrowdSec is pretty awesome and after talking in the discord they’ve already accomodated both these scenarios within a relase cycle or two. So these two examples are solved. I’m sure you’ll find new ones, though ;-)

A Web Example

Let’s say that you’ve installed the Caddy collection, but you’ve noticed basic auth login failures don’t trigger the parser. So let’s add a new file and edit it.

sudo cp /etc/crowdsec/parsers/s01-parse/caddy-logs.yaml /etc/crowdsec/parsers/s01-parse/caddy-logs-custom.yaml

You’ll notice two top level sections where the parsing happens; nodes and statics and some grok pattern matching going on.

Nodes allow you try multiple patterns and if any match, the whole section is considered successful. I.e. if the log could have either the standard HTTPDATE or a CUSTOMDATE, as long as it has one it’s good and the matching can move on. Statics just goes down the list extracting data. If any fail the whole event is considered a fail and dropped as unparseable.

All the pasrsed data gets attached to event as ’evt.Parsed.something’ and some of the statics are moving it to evt values the Senarios will be looking for Caddy logs are JSON formatted and so basically already parsed and this example makes use of the JsonExtract method quite a bit.

# We added the caddy logs in the acquis.yaml file with the label 'caddy' and so we use that as our filter here
filter: "evt.Parsed.program startsWith 'caddy'"
onsuccess: next_stage
# debug: true
name: caddy-logs-custom
description: "Parse custom caddy logs"
pattern_syntax:
 CUSTOMDATE: '%{DAY:day}, %{MONTHDAY:monthday} %{MONTH:month} %{YEAR:year} %{TIME:time} %{WORD:tz}'
nodes:
  - nodes:
    - grok:
        pattern: '%{NOTSPACE} %{NOTSPACE} %{NOTSPACE} \[%{HTTPDATE:timestamp}\]%{DATA}'
        expression: JsonExtract(evt.Line.Raw, "common_log")
        statics:
          - target: evt.StrTime
            expression: evt.Parsed.timestamp
    - grok:
        pattern: "%{CUSTOMDATE:timestamp}"
        expression: JsonExtract(evt.Line.Raw, "resp_headers.Date[0]")
        statics:
          - target: evt.StrTime
            expression: evt.Parsed.day + " " + evt.Parsed.month + " " + evt.Parsed.monthday + " " + evt.Parsed.time + ".000000" + " " + evt.Parsed.year
    - grok:
        pattern: '%{IPORHOST:remote_addr}:%{NUMBER}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_addr")
    - grok:
        pattern: '%{IPORHOST:remote_ip}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_ip")
    - grok:
        pattern: '\["%{NOTDQUOTE:http_user_agent}\"]'
        expression: JsonExtract(evt.Line.Raw, "request.headers.User-Agent")
statics:
  - meta: log_type
    value: http_access-log
  - meta: service
    value: http
  - meta: source_ip
    expression: evt.Parsed.remote_addr
  - meta: source_ip
    expression: evt.Parsed.remote_ip
  - meta: http_status
    expression: JsonExtract(evt.Line.Raw, "status")
  - meta: http_path
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - target: evt.Parsed.request #Add for http-logs enricher
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - parsed: verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_user_agent
    expression: evt.Parsed.http_user_agent
  - meta: target_fqdn
    expression: JsonExtract(evt.Line.Raw, "request.host")
  - meta: sub_type
    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

The very last line is where a status 401 is checked. It looks for a 401 and a request for Basic auth. However, this misses events where someone asks for a resource that is protected and the serer responds telling you Basic is needed. I.e. when a bot is poking at URLs on your server ignoring the prompts to login. You can look at the log entries more easily with this command to follow the log and decode it while you recreate failed attempts.

sudo tail -f /var/log/caddy/access.log | jq

To change this, update the expression to also check the response header with an additional ? (or) condition.

    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"xtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'resp_headers.Www-Authenticate[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

Syslog Example

Let’s say you’re using dropbear and failed logins are not being picked up by the ssh parser

To see what’s going on, you use the crowdsec command line interface. The shell command is cscli and you can ask it about it’s metrics to see how many lines it’s parsed and if any of them are suspicious. Since we just restarted, you may not have any syslog lines yet, so let’s add some and check.

ssh [email protected]
logger "This is an innocuous message"

cscli metrics
INFO[28-06-2022 02:41:33 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
|         SOURCE         | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 1          | -            | 1              | -                      |
+------------------------+------------+--------------+----------------+------------------------+

Notice that the line we just read is unparsed and that’s OK. That just means it wasn’t an entry the parser cared about. Let’s see if it responds to an actual failed login.

dbclient some.remote.host

# Enter some bad passwords and then exit with a Ctrl-C. Remember, localhost attempts are whitelisted so you must be remote.
[email protected]'s password:
[email protected]'s password:

cscli metrics
INFO[28-06-2022 02:49:51 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
|         SOURCE         | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 7          | -            | 7              | -                      |
+------------------------+------------+--------------+----------------+------------------------+

Well, no luck. We will need to adjust the parser

sudo cp /etc/crowdsec/parsers/s01-parse/sshd-logs.yaml /etc/crowdsec/parsers/s01-parse/sshd-logs-custom.yaml

Take a look at the logfile and copy an example line over to https://grokdebugger.com/. Use a pattern like

Bad PAM password attempt for '%{DATA:user}' from %{IP:source_ip}:%{INT:port}

Assuming you get the pattern worked out, you can then add a section to the bottom of the custom log file you created.

  - grok:
      name: "SSHD_AUTH_FAIL"
      pattern: "Login attempt for nonexistent user from %{IP:source_ip}:%{INT:port}"
      apply_on: message

5.1.4 - On Alpine

Install

There are some packages available, but (as of 2022) they are a bit behind and don’t include the config and service files. So let’s download the latest binaries from Crowsec and create our own.

Download the current release

Note: Download the static versions. Alpine uses a differnt libc than other distros.

cd /tmp
wget https://github.com/crowdsecurity/crowdsec/releases/latest/download/crowdsec-release-static.tgz
wget https://github.com/crowdsecurity/cs-firewall-bouncer/releases/latest/download/crowdsec-firewall-bouncer.tgz

tar xzf crowdsec-firewall*
tar xzf crowdsec-release*
rm *.tgz

Install Crowdsec and Register with The Central API

You cannot use the wizard as it expects systemd and doesn’t support OpenRC. Follow the Binary Install steps from CrowdSec’s binary instrcutions.

sudo apk add bash newt envsubst
cd /tmp/crowdsec-v*

# Docker mode skips configuring systemd
sudo ./wizard.sh --docker-mode

sudo cscli hub update
sudo cscli machines add -a
sudo cscli capi register

# A collection is just a bunch of parsers and scenarios bundled together for convienence
sudo cscli collections install crowdsecurity/linux 

Install The Firewall Bouncer

We need a netfilter tool so install nftables. If you already have iptables installed you can skip this step and set FW_BACKEND to that below when generating the API keys.

sudo apk add nftables

Now we install the firewall bouncer. There is no static build of the firewall bouncer yet from CrowdSec, but you can get one from Alpine testing (if you don’t want to compile it yourself)

# Change from 'edge' to other versions a needed
echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
apk update
apk add cs-firewall-bouncer

Now configure the bouncer. We will once again do this manually becase there is not support for non-systemd linuxes with the install script. But cribbing from their install script, we see we can:

cd /tmp/crowdsec-firewall*

BIN_PATH_INSTALLED="/usr/local/bin/crowdsec-firewall-bouncer"
BIN_PATH="./crowdsec-firewall-bouncer"
sudo install -v -m 755 -D "${BIN_PATH}" "${BIN_PATH_INSTALLED}"

CONFIG_DIR="/etc/crowdsec/bouncers/"
sudo mkdir -p "${CONFIG_DIR}"
sudo install -m 0600 "./config/crowdsec-firewall-bouncer.yaml" "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"

Generate The API Keys

Note: If you used the APK, just do the first two lines to get the API_KEY (echo $API_KEY) and manually edit the file (vim /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml)

cd /tmp/crowdsec-firewall*
CONFIG_DIR="/etc/crowdsec/bouncers/"

SUFFIX=`tr -dc A-Za-z0-9 </dev/urandom | head -c 8`
API_KEY=`sudo cscli bouncers add cs-firewall-bouncer-${SUFFIX} -o raw`
FW_BACKEND="nftables"
API_KEY=${API_KEY} BACKEND=${FW_BACKEND} envsubst < ./config/crowdsec-firewall-bouncer.yaml | sudo install -m 0600 /dev/stdin "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"

Create RC Service Files

sudo touch /etc/init.d/crowdsec
sudo chmod +x /etc/init.d/crowdsec
sudo rc-update add crowdsec

sudo vim /etc/init.d/crowdsec
#!/sbin/openrc-run

command=/usr/local/bin/crowdsec
command_background=true

pidfile="/run/${RC_SVCNAME}.pid"

depend() {
   need localmount
   need net
}

Note: If you used the package from Alpine testing above it came with a service file. Just rc-update add cs-firewall-bouncer and skip this next step.

sudo touch /etc/init.d/cs-firewall-bouncer
sudo chmod +x /etc/init.d/cs-firewall-bouncer
sudo rc-update add cs-firewall-bouncer

sudo vim /etc/init.d/cs-firewall-bouncer
#!/sbin/openrc-run

command=/usr/local/bin/crowdsec-firewall-bouncer
command_args="-c /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml"
pidfile="/run/${RC_SVCNAME}.pid"
command_background=true

depend() {
  after firewall
}

Start The Services and Observe The Results

Start up the services and view the logs to see that everything started properly

sudo service start crowdsec
sudo service cs-firewall-bouncer status

sudo tail /var/log/crowdsec.log
sudo tail /var/log/crowdsec-firewall-bouncer.log

# The firewall bouncer should tell you about how it's inserting decisions it got from the hub

sudo cat /var/log/crowdsec-firewall-bouncer.log

time="28-06-2022 13:10:05" level=info msg="backend type : nftables"
time="28-06-2022 13:10:05" level=info msg="nftables initiated"
time="28-06-2022 13:10:05" level=info msg="Processing new and deleted decisions . . ."
time="28-06-2022 14:35:35" level=info msg="100 decisions added"
time="28-06-2022 14:35:45" level=info msg="1150 decisions added"
...
...

# If you are curious about what it's blocking
sudo nft list table crowdsec
...

6 - Storage

6.1 - Seafile

TODO - seafile 11 is in beta and mysql is required.

Seafile is a cloud storage system, similar to google drive. It stands out for being simpler and faster than it’s peers. It’s also open source.

Preparation

You’ll need a linux server. We use Debian 12 in this example and instructions are based on Seafile’s SQLite instructions, updated for the new OS.

cffi build issues[^cffi],

and a python virtual environement so apt and pip packages play nice.

# The main requirements
sudo apt install -y memcached libmemcached-dev pwgen sqlite3
sudo systemctl enable --now memcached

# Python specific things
sudo apt install -y python3 python3-setuptools python3-pip 


sudo apt install python3-wheel python3-django python3-django-captcha python3-future python3-willow python3-pylibmc python3-jinja2 python3-psd-tools python3-pycryptodome python3-cffi



# cffi build requirements
sudo apt install -y build-essential libssl-dev libffi-dev python-dev-is-python3

# Install the service account and create a python virtual environment for them
sudo apt install python3-venv
sudo useradd --home-dir /opt/seafile --system --comment "Seafile Service Account" --create-home seafile
sudo -i -u seafile
python3 -m venv .venv
source .venv/bin/activate

# Install the rest of the packages from pip
pip3 install --timeout=3600 \
  wheel django django-pylibmc django-simple-captcha future \
  Pillow pylibmc captcha jinja2 psd-tools pycryptodome cffi

192.168.1.21:/srv/seafile /srv/seafile nfs defaults,noatime,vers=4.1 0 0

Installation

It comes with two services. Seafile, the file sync server, and Seahub, a web interface and editor.

For a small team, you can install a lightweight instance of Seafile using a single host and sqlite.

Note: There is a seafile repo, but it may be [client] only. TODO test this

As per the install [instructions] this will create several folders in seafile’s home directory and a symlink to the binaries in a version specific directory for easy upgrades.

# Contine as the seafile user - the python venv should still be in effect. If not, source as before

# Downlaod and exract the binary
wget -P /tmp https://s3.eu-central-1.amazonaws.com/download.seadrive.org/seafile-server_10.0.1_x86-64.tar.gz
tar -xzf /tmp/seafile-server_10.0.1_x86-64.tar.gz -C /opt/seafile/
rm /tmp/seafile*

# Run the setup script
cd /opt/seafile/sea*
./setup-seafile.sh

# Start seafile and seahub to answer some setup questions
./seafile.sh start
./seahub.sh start

./seahub.sh stop
./seafile.sh stop

Create systemd service files1 for the two services. (as a sudo capable user)

sudo tee /etc/systemd/system/seafile.service << EOF
[Unit]
Description=Seafile
After=network.target

[Service]
Type=forking
ExecStart=/opt/seafile/seafile-server-latest/seafile.sh start
ExecStop=/opt/seafile/seafile-server-latest/seafile.sh stop
LimitNOFILE=infinity
User=seafile
Group=seafile

[Install]
WantedBy=multi-user.target
EOF

Note: The ExecStart below is a bit cumbersome, but it saves modifying the vendor’s start script. Only the Seahub service seems to need the virtual env, though you can give both services the same treatment if you wish.

sudo tee /etc/systemd/system/seahub.service << EOF
[Unit]
Description=Seafile hub
After=network.target seafile.service

[Service]
Type=forking
ExecStart=/bin/bash -c 'source /opt/seafile/.venv/bin/activate && /opt/seafile/seafile-server-latest/seahub.sh start'
ExecStop=/bin/bash -c 'source /opt/seafile/.venv/bin/activate && /opt/seafile/seafile-server-latest/seahub.sh stop'
User=seafile
Group=seafile

[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now seafile.service
sudo systemctl enable --now seahub.service

Seafile and Seahub should have started without error, though by default you can only access it from locahost.

If you run into problems here make sure to start Seafile first. Expiriment with sourcing the activation file as the seafile user and running the start script directly.

Add logrotation

sudo tee /etc/logrotate.d/seafile << EOF
/opt/seafile/logs/seafile.log
/opt/seafile/logs/seahub.log
/opt/seafile/logs/file_updates_sender.log
/opt/seafile/logs/repo_old_file_auto_del_scan.log
/opt/seafile/logs/seahub_email_sender.log
/opt/seafile/logs/work_weixin_notice_sender.log
/opt/seafile/logs/index.log
/opt/seafile/logs/content_scan.log
/opt/seafile/logs/fileserver-access.log
/opt/seafile/logs/fileserver-error.log
/opt/seafile/logs/fileserver.log
{
        daily
        missingok
        rotate 7
        # compress
        # delaycompress
        dateext
        dateformat .%Y-%m-%d
        notifempty
        # create 644 root root
        sharedscripts
        postrotate
                if [ -f /opt/seafile/pids/seaf-server.pid ]; then
                        kill -USR1 `cat /opt/seafile/pids/seaf-server.pid`
                fi

                if [ -f /opt/seafile/pids/fileserver.pid ]; then
                        kill -USR1 `cat /opt/seafile/pids/fileserver.pid`
                fi

                if [ -f /opt/seafile/pids/seahub.pid ]; then
                        kill -HUP `cat /opt/seafile/pids/seahub.pid`
                fi

                find /opt/seafile/logs/ -mtime +7 -name "*.log*" -exec rm -f {} \;
        endscript
}
EOF

Configuration

Seahub (the web UI) by default is bound to localhost only. Change that to all addresses so you can access it from other systems.

sudo sed -i 's/^bind.*/bind = "0.0.0.0:8000"/'  /opt/seafile/conf/gunicorn.conf.py

If you’re not proxying already, check the seahub settings. You may need to add the correct internal name and port for ititial access. You should add the file server root as well so you don’t have to add it in the GUI later.

vi /opt/seafile/conf/seahub_settings.py

SERVICE_URL = "http://seafile.some.lan:8000/" 
FILE_SERVER_ROOT = "http://seafile.some.lan:8082"

Add a connection to the memcache server

sudo tee -a /opt/seafile/conf/seahub_settings.py << EOF
CACHES = {
    'default': {
        'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',
        'LOCATION': '127.0.0.1:11211',
    },
}
EOF

And restart to take affect

sudo systemctl restart seahub

You should now be able to login at http://some.server:8000/ with the credentials you created during the command line setup. If the web GUI works, but you can’t download files or the markdown editor doesn’t work as expected, check the FILE_SERVER_ROOT and look in the GUI’s System Admin section at those settings.

NFS Mount

Large amounts of data are best handled by a dedicated storage system and those are usually mounted over the network via NFS or a similar protocol. Seafile data should be stored in such a system, but you cannot mount the entire Seafile data folder over the network as it includes SQLite data that recommends2 against that. Nor can you mount each subdirectory seperately as they rely upon internal links that must be on the same filesystem.

The solution is to mount a network share in an alternate location and symlink the relative parts of the Seafile data directory to it.

sudo mount nfs.server:/exports/seafile /mnt/seafile

sudo systemctl stop seahub
sudo systemctl stop seafile

sudo mv /opt/seafile/seafile-data/httptemp \
	/opt/seafile/seafile-data/storage \
	/opt/seafile/seafile-data/tmpfiles \
/mnt/seafile/

sudo ln -s /mnt/seafile/httptemp /opt/seafile/seafile-data/
sudo ln -s /mnt/seafile/storage /opt/seafile/seafile-data/
sudo ln -s /mnt/seafile/tmpfiles /opt/seafile/seafile-data/

sudo chown -R seafile:seafile /mnt/seafile

Proxy

Say something about why caddy, then give the proxy file, then say HTTP/3 and enabling UDP 443 and seeing it in the logs. with firefox enabled. No special server config.

https://caddy.community/t/caddy-v2-and-seafile-server-on-a-root-server/9188/2

Note the change in the GUI for the 8082

https://www.seafile.com/en/download/#server


  1. https://manual.seafile.com/deploy/start_seafile_at_system_bootup/ ↩︎

  2. https://www.sqlite.org/faq.html#q5 [client]:https://help.seafile.com/syncing_client/install_linux_client/ [instructions]:https://manual.seafile.com/deploy/using_sqlite/ ↩︎

7 - Web

7.1 - Content Mgmt

There are many ways to manage and produce web content. Traditionally, you’d use a large application with roles and permissions.

A more modern approach is to use a distributed version control system, like git, and a site generator.

Static Site Generators are gaining popularity as they produce static HTML with javascript and CSS that can be deployed to any Content Delivery Network without need for server-side processing.

Astro is great, as is Hugo, with the latter being around longer and having more resources.

7.1.1 - Hugo

Hugo is a Static Site Generator (SSG) that turns Markdown files into static web pages that can be deployed anywhere.

Like WordPress, you apply a ’theme’ to style your content. But rather than use a web-inteface to create content, you directly edit the content in markdown files. This lends itself well tomanaging the content as code and appeals to those who prefer editing text.

However, unlike other SSGs, you don’t have to be a front-end developer to get great results and you can jump in with a minimal investment of time.

7.1.1.1 - Hugo Install

Requirements

I use Debian in this example, but any apt-based distro will be similar.

Preparation

Enable and pin the Debian Backports and Testing repos so you can get recent versions of Hugo and needed tools.

–> Enable and Pin

Installation

Hugo requires git and go

# Assuming you have enable backports as per above
sudo apt install -t bullseye-backports git
sudo apt install -t bullseye-backports golang-go

For a recent version of Hugo you’ll need to go to the testing repo. The extended version is recommended by Hugo and it’s chosen by default.

# This pulls in a number of other required packages, so take a close look at the messages for any conflicts. It's normally fine, though. 
sudo apt install -t testing  hugo

Configuration

A quick test right from the quickstart page to make sure everything works

hugo new site quickstart
cd quickstart
git init
git submodule add https://github.com/theNewDynamic/gohugo-theme-ananke themes/ananke
echo "theme = 'ananke'" >> config.toml
hugo server

Open up a browser to http://localhost:1313/ and you you’ll see the default ananke-themed site.

Next Steps

The ananke theme you just deployed is nice, but a much better theme is Docsy. Go give that a try.

–> Deploy Docsy on Hugo

7.1.1.2 - Docsy Install

Docsy is a good-looking Hugo theme that provides a landing page, blog, and a documentation sub-sites using bootstrap CSS.

The documentation site in particular let’s you turn a directory of text files into a documentation tree with relative ease. It even has a collapsible left nav bar. That is harder to find than you’d think.

Preparation

Docsy requires Hugo. Install that if you haven’t already. It also needs a few other things; postcss, postcss-cli, and autoprefixer from the Node.JS ecosystem. These should be installed in the project directory as version requirements change per theme.

mkdir some.site.org
cd some.site.org
sudo apt install -t testing nodejs npm
npm install -D autoprefixer 
npm install -D postcss
npm install -D postcss-cli

Installation

Deploy Docsy as a Hugo module and pull in the example site so we have a skeleton to work with. We’re using git, but we’ll keep it local for now.

git clone https://github.com/google/docsy-example.git .
hugo server

Browse to http://localhost:1313 and you should see the demo “Goldydocs” site.

Now you can proceed to configure Docsy!

Updating

The Docsy theme gets regular updates. To incorporate those you only have to run this command. Do this now, actually, to get any theme updates the example site hasn’t incoporated yet.

cd /path/to/my-existing-site
hugo mod get -u github.com/google/docsy

Troubleshooting

hugo

Error: Error building site: POSTCSS: failed to transform “scss/main.css” (text/css)>: Error: Loading PostCSS Plugin failed: Cannot find module ‘autoprefixer’

And then when you try to install the missing module

The following packages have unmet dependencies: nodejs : Conflicts: npm npm : Depends: node-cacache but it is not going to be installed

You may have already have installed Node.JS. Skip trying to install it from the OS’s repo and see if npm works. Then proceed with postcss install and such.

7.1.1.3 - Docsy Config

Let’s change the basics of the site in the config.toml file. I put some quick sed commands here, but you can edit by hand as well. Of note is the Github integration. We prepoulate it here for future use, as it allows quick edits in your browser down the road.

SITE=some.site.org
GITHUBID=someUserID
sed -i "s/Goldydocs/$SITE/" config.toml
sed -i "s/The Docsy Authors/$SITE/" config.toml
sed -i "s/example.com/$SITE/" config.toml
sed -i "s/example.org/$SITE/" config.toml
sed -i "s/google\/docsy-example/$GITHUBID\/$SITE/" config.toml 
sed -i "s/USERNAME\/REPOSITORY/$GITHUBID\/$SITE/" config.toml 
sed -i "s/https:\/\/policies.google.com//" config.toml
sed -i "s/https:\/\/github.com\/google\/docsy/https:\/\/github.com\/$GITHUBID/" config.toml
sed -i "s/github_branch/#github_branch/" config.toml

If you don’t plan to translate your site into different languages, you can dispense with some of the extra languages as well.

# Delete the 20 or so lines starting at "lLanguage] and stopping at the "[markup]" section,
# including the english section.
vi config.tml

# Delete the folders from 'content/' as well, leaving 'en'
rm -rf content/fa content/no

You should also set a default meta description or the engine will put in in the bootstrap default and google will summarize all your pages with that

vi config.toml
[params]
copyright = "some.site.org"
privacy_policy = "/privacy"
description = "My personal website to document what I know and how I did it"

Keep and eye on the site in your browser as you make changes. When you’re ready to start with the main part of adding content, take a look at the next section.

Docsy Operation

Notes

You can’t dispense with the en folder yet, as it breaks some github linking functionality you may want to take advantage of later

7.1.1.4 - Docsy Operate

This is a quick excerpt from the Docsy Content and Customization docs. Definitely spend time with those after reading the overview here.

Directory Layout

Content is, appropriately enough, in the content directory, and it’s subdirectories line up with the top-level navigation bar of the web site. About, Documentation, etc corresponds to content/about, content/docs and so on.

The directories and files you create will be the URL that you get with one important exception, filenames are converted to a ‘slug’, mimicking how index files work. For example, If you create the file docs/tech/mastadon.md the URL will be /docs/tech/mastadon/. This is for SEO (Search Engine Optimization).

The other thing you’ll see are _index.html files. In the example above, the URL /docs/tech/ has no content, as it’s a folder. But you can add a _index.md or .html to give it some. Avoid creating index.md or tech.md (a file that matches the name of a subdirectory). Either of those will block Hugo from generating content for any subdirectories.

The Landing Page and Top Nav Pages

The landing page itself is the content/_index.html file and the background is featured-background.jpg. The other top-nav pages are in the content folders with _index files. You may notice the special header variable “menu: main: weight: " and that is what flags that specific page as worth of being in the top menu. Removing that, or adding that (and a linkTitle) will change the top nav.

The Documentation Page and Left Nav Bar

One of the most important features of the Docsy template is the well designed documentation section that features a Section menu, or left nav bar. This menu is built automatically from the files you put in the docs folder, as long as you give them a title. (See Front Matter, below). They are ordered by date but you can add a weight to change that.

It doesn’t collapse by default and if you have a lot of files, you’ll want to enable that.

# Search and set in your config.toml
sidebar_menu_compact = true

Front Matter

The example files have a section at the top like this. It’s not strictly required, but you must have at least the title or they won’t show up in the left nav tree.

---
title: "Examples"
---

Page Content and Short Codes

In addition to normal markdown or html, you’ll see frequent use of ‘shortcodes’ that do things that normal markdown can’t. These are built in to Hugo and can be added by themes, and look like this;

{{% blocks/lead color="dark" %}}
Some Important Text
{{% /blocks/lead %}}

Diagrams

Docsy supports mermaid and a few other tools for creating illustrations from code, such as KaTeX, Mermaid, Diagrams.net, PlantUML, and MarkMap. Simply use a codeblock.

```mermaid
graph LR
 one --> two
```

Generate the Website

Once you’re satisfied with what you’ve got, tell hugo to generate the static files and it will populate the folder we configured earliet

hugo

Publish the Web Site

Everything you need is in the public folder and all you need do is copy it to a web server. You can even use git, which I advise since we’re already using it to pull in and update the module.

–> Local Git Deployment

Bonus Points

If you have a large directory structure full of markdown files already, you can kick-start the process of adding frontmatter like this;

find . -type f | \
while read X
do
  TITLE=$(basename ${X%.*})
  FRONTMATTER=$(printf -- "---\ntitle = ${TITLE}\n---")
  sed -i "1s/^/$FRONTMATTER\n/" "$X"
done

7.1.1.5 - Docsy Github

You may have noticed the links on the right like “Edit this page” that takes one to Github. Let’s set those up.

On Github

Go to github and create a new repository. Use the name of your side for the repo name, such as “some.site.org”. If you want to use something else, you can edit your config.toml file to adjust.

Locally

You man have noticed that Github suggested some next steps with a remote add using the name “origin”. Docsy is already using that, however, from when you cloned it. So we’ll have to pick a new name.

cd /path/to/my-existing-site
git remote add github https://github.com/yourID/some.site.org

Let’s change our default banch to “main” to match Github’s defaults.

git branch -m main

Now we can add, commit and push it up to Github

git add --all
git commit -m "first commit of new site"
git push github

You’ll notice something interesting when you go back to look at Github; all the contributers on the right. That’s because you’re dealing with a clone of Docsy and you can still pull in updates and changes from original project.

It may have been better to clone it via github

7.2 - Content Deployment

Automating deployment as part of a general continuous integration strategy is best-practice these days. Web content should be similarly treated.

I.e. version controlled and deployed with git.

7.2.1 - Local Git Deployment

Overview

Let’s create a two-tiered system that goes from dev to prod using a post-commit trigger

graph LR Development --git / rsync---> Production

The Development system is your workstation. git commit will trigger a build and rsync.

The Production system is a web server. Any web server will do as long as you have SSH access and can update a web-root folder.

I use Hugo in this example, but any system that has an output (or build) folder works similarly.

Configuration

The first thing we need to know is where wee are going, so lets prepare production first.

Production System

This server probably uses folders like /var/www/XXXXX for its web root. Use that or create a new folder and make yourself the owner.

sudo mkdir /var/www/some.site.org
sudo chown -R $USER /var/www/some.site.org
echo "Hello" > /var/www/some.site.org/index.html

Edit your web server’s config to make sure you can view that web page. Also check that rsync is available from the command line.

Development System

Hugo builds static html in a public directory. To generate the HTML, simply type hugo

cd /path/to/my-existing-site
hugo
ls public

We don’t actually want this folder in git and most themes (if you’re using Hugo) already exclude it. Look for a .gitignore file to and create/add if needed.

# Notice /public is at the top of the git ignore file
cat .gitignore

/public
package-lock.json
.hugo_build.lock
...

Assuming you have some content, let’s add and commit it.

git add --all
git commit -m "Initial Commit"

Note: All of these git commands work because pulling in a theme initialized the directory. If you’re doing something else you’ll need to git init.

The last step is to create a hook that will build and deploy after a commit.

cd /path/to/my-existing-site
touch .git/hooks/post-commit
chmod +x .git/hooks/post-commit
vi .git/hooks/post-commit
#!/bin/sh
hugo --cleanDestinationDir
rsync --recursive --delete public/ [email protected]:/var/www/some.site.org

This script ensures that the remote directory matches your local directory. When you’re ready to update the remote site:

git add --all
git commit --allow-empty -m "trigger update"

If you mess up the production files, you can just call the hook manually.

cd /path/to/my-existing-site
touch .git/hooks/post-commit

Troubleshooting

bash: line 1: rsync: command not found

Double check that the remote host has rsync.

7.3 - Content Delivery

7.3.1 - Cloudflare

  • Cloudflare acts as a reverse proxy to hide your server’s IP address
  • Takes over your DNS and directs requests to the closest site
  • Injects JavaScript analytics
    • If the browser’s “do not track” is on, JS isn’t injected.
  • Can uses a tunnel and remove encryption overhead

7.4 - Servers

7.4.1 - Caddy

Caddy is a web server that runs SSL by default by automatically grabing a cert from Let’s Encrypt. It comes as a stand-alone binary, written in Go, and makes a decent reverse proxy.

7.4.1.1 - Installation

Installation

Caddy recommends “using our official package for your distro” and for debian flavors they include the basic instructions you’d expect.

Configuration

The easiest way to configure Caddy is by editing the Caddyfile

sudo vi /etc/caddy/Caddyfile
sudo systemctl reload caddy.service

Sites

You define websites with a block that includes a root and the file_server directive. Once you reload, and assuming you already have the DNS in place, Caddy will reach out to Let’s Encrypt, acquire a certificate, and automatically forward from port 80 to 443

site.your.org {        
    root * /var/www/site.your.org
    file_server
}

Authentication

You can add basicauth to a site by creating a hash and adding a directive to the site.

caddy hash-password
site.your.org {        
    root * /var/www/site.your.org
    file_server
    basicauth { 
        allen SomeBigLongStringFromTheCaddyHashPasswordCommand
    }
}

Reverse Proxy

Caddy also makes a decent reverse proxy.

site.your.org {        
    reverse_proxy * http://some.server.lan:8080
}

You can also take advantage of path-based reverse proxy. Note the rewrite to accommodate the trailing-slash potentially missing.

site.your.org {
    rewrite /audiobooks /audiobooks/
    handle_path /audiobooks/* {
        uri strip_prefix /audiobooks/
        reverse_proxy * http://some.server.lan:8080
    }
}

Include Blocks

You can define common elements at the top and include them on multiple sites. This helps when you have many sites.

(logging) {
    log {
        output file /var/log/caddy/access.log
    }
}
site.your.org {
    import logging     
    reverse_proxy * http://some.server.lan:8080
}

Modules

Caddy is a single binary so when adding a new module (aka feature) you are essentially downloading a new version that has them compiled in. You can find the list of packages at their download page.

Do this at the command line with caddy itself.

sudo caddy add-package github.com/mholt/caddy-webdav
systemctl restart caddy

Security

Drop Unknown Domains

Caddy will accept connections to port 80, announce that it’s a Caddy web server and redirect you to https before realizing it doesn’t have a site or cert for you. Configure this directive at the bottom so it drops immediately.

http:// {
    abort
}

Crowdsec

Caddy runs as it’s own user and is fairly memory-safe. But installing Crowdsec helps identify some types of intrusion attempts.

[TODO]

Troubleshooting

You can test your config file and look at the logs like so

caddy validate --config /etc/caddy/Caddyfile
journalctl --no-pager -u caddy

7.4.1.2 - WebDAV

Caddy can also serve WebDAV requests with the appropriate module. This is important because for many clients, such as Kodi, WebDAV is significantly faster.

sudo caddy add-package github.com/mholt/caddy-webdav
sudo systemctl restart caddy
{   # Custom modules require order of precedence be defined
    order webdav last
}
site.your.org {
    root * /var/www/site.your.org
    webdav * 
}

You can combine WebDAV and Directly Listing - highly recommended - so you can browse the directory contents with a normal web browser as well. Since WebDAV doesn’t use the GET method, you can use the @get filter to route those to the file_server module so it can serve up indexes via the browse argument.

site.your.org {
    @get method GET
    root * /var/www/site.your.org
    webdav *
    file_server @get browse        
}

Sources

https://github.com/mholt/caddy-webdav https://marko.euptera.com/posts/caddy-webdav.html

7.4.1.3 - MFA

The package caddy-security offers a suite of auth functions. Among these is MFA and a portal for end-user management of tokens.

Installation

# Install a version of caddy with the security module 
sudo caddy add-package github.com/greenpau/caddy-security
sudo systemctl restart caddy

Configuration

/var/lib/caddy/.local/caddy/users.json

caddy hash-password

Troubleshooting

journalctl –no-pager -u caddy

7.4.1.4 - Wildcard DNS

When caddy creates a cert it uses the specific hostname of the site it’s for. Let’s Encrypt then publishes that as part of certificate transparency for all the world to see. In fact, if you create a new site in caddy, you’ll see bots probing for weaknesses within 30 min - without you even having published the URL. The only real solution is a wildcard cert. Instead of individual certs for each site, you create just one and use it everywhere, you get SSL without publishing all of your site names. Anonymity isn’t a replacement for security, but it’s one less piece of info for adversaries - and some privacy once in a while is nice.

Caddy supports this, but it’s not in the default binary and it requires DNS validation. Use the caddy add-package command and follow the custom build support instructions. Then create a token at your DNS provider and update your caddy config.

Installation

Note: make sure to install caddy first. This also assumes cloudflare. Check https://github.com/caddy-dns to see if your DNS provider is listed

# Divert the default binary from the repo
sudo dpkg-divert --divert /usr/bin/caddy.default --rename /usr/bin/caddy
sudo cp /usr/bin/caddy.default /usr/bin/caddy.custom
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.default 10
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.custom 50

# Add the package and restart. 
sudo caddy add-package github.com/caddy-dns/cloudflare
sudo systemctl restart caddy.service    

From here on out, apt update will not upgrade caddy. You must enter caddy upgrade at the command line. The devs don’t think this should be an issue.

DNS Server Configuration

This is different per provider, but for Cloudflare, a decent example is below. Just focus on the ‘Getting the Cloudflare API Token’ part

https://roelofjanelsinga.com/articles/using-caddy-ssl-with-cloudflare/

Caddy Configuration

You can add your API key at the top of your Caddyfile config in the global block and caddy will use it for all subsequent cert acquisitions. But for the sites, you have to do something a little different. Since caddy uses the site address line to choose the cert, you must make the site * and then match and handle subsites.

{
	acme_dns cloudflare alotcharactersandnumbershere
}

*.some.org, some.org {

	@site1 host site1.some.org
	handle @site1 {
    	reverse_proxy * http://localhost:3200
	}

	@site2 host site2.some.org
	handle @site2 {
     	root * /srv/www/site2
	}
}