This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

This is the documentation root. Use the left-hand nav bar to descend taxonomically, or use the search to find what you are after.

1 - Internet

1.1 - DNS

Web pages today are complex. Your browser will make on average 401 DNS queries to find the various parts of an average web page, so implementing a local DNS system is key to keeping things fast.

In general, you can implement either a caching or recursive server with the choice between speed vs privacy.

Types of DNS Servers

A caching server accepts and caches queries, but doesn’t actually do the lookup itself. It forwards the request on to another DNS server and waits for the answer. If you have a lot of clients configured to use it, chances are someone else has already asked for what you want and it can supply the answer quickly from cache.

A recursive server does more than just cache answers. It knows how to connect to the root of the internet and find out itself. If you need to find some.test.com, it will connect to the .com server, ask where test.com is, then connect to test.com and ask it for some.

Comparison

Between the two, the caching server will generally be faster. If you connect to a large DNS service they will almost always have things cached. You will also get geographically relevant results as content providers work with DNS providers to direct you to the closest content cache.

With a recursive server, you do the lookup yourself and no single entity is able to monitor your DNS queries. You also aren’t dependant upon any upstream provider. But you make every lookup ’the long way’, and that can take many hundreds of milliseconds on some cases, a large part of a page load time.

Testing

In an ad hoc test on a live network with about 5,000 residential user devices, about half the queries were cached. The other half were sent to either quad 9 or a local resolver. Quad 9 took about half the time that the local resolver did.

Here are the numbers - with Steve Gibson’s DNS benchmarker against pi-hole forwarding to a local resolver vs pi-hole forwarding to quad 9. Cached results excluded.

    Forwarder     |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  - Uncached Name | 0.015 | 0.045 | 0.214 | 0.046 | 100.0 |
  - DotCom Lookup | 0.015 | 0.019 | 0.034 | 0.005 | 100.0 |
  ---<O-OO---->---+-------+-------+-------+-------+-------+

    Resolver      |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  - Uncached Name | 0.016 | 0.078 | 0.268 | 0.079 | 100.0 |
  - DotCom Lookup | 0.018 | 0.035 | 0.078 | 0.017 | 100.0 |
  ---<O-OO---->---+-------+-------+-------+-------+-------+

Selection

This test is interesting, but not definitive. While the DNS benchmark shows that the uncached average is better, page load perception is different than the sum of DNS queries. A page metric test would be good, but in general, faster is better.

Use a caching server.

One last point: use your ISP’s name server when possible. They will direct you to their local content caching systems for Netflix, Google (YouTube) and Akamai. If you use quad 9 like I did, you may get to a regional content location, but you miss out on things optimized specifically for your geographic location.

They are (probably) capturing all your queries for monetization, and possibly directing you to their own their own advertising server when you mis-key in a domain name. So you’ll need to decide;

Speed vs privacy.


  1. Informal personal checking of random popular sites. ↩︎

1.1.1 - Pi-hole

Pi-hole is reasonable choice for DNS service, especially if you don’t have a separate metrics and reporting system. A single instance will scale to 1000 active clients with just 1 core and 500M RAM and do a good job showing what’s going on.

There are some caveats when you pass 1000 users when logging all queries, but it’s a

Preparation

Prepare and secure a Debian system

Set a Static Address

sudo vi /etc/network/interfaces

Change

# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp

to

auto eth0
iface eth0 inet static
    address 192.168.0.2/24
    gateway 192.168.0.1

Secure Access with Nftables

Nftables is the modern replacement for iptables and preferred for setting netfilter rules.

sudo apt install nftables
sudo systemctl enable nftables
sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0;

                # accept any localhost traffic
                iif lo accept

                # accept already allowed and related traffic
                ct state established,related accept

                # accept DNS and DHCP traffic from internal only
                define RFC1918 = { 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12 }
                udp dport { domain, bootps } ip saddr $RFC1918 ct state new accept
                tcp dport { domain, bootps } ip saddr $RFC1918 ct state new accept

                # accept web and ssh traffic on the first interface or from an addr range
                iifname eth0 tcp dport { ssh, http } ct state new accept
                 # or 
                ip saddr 192.168.0.1/24 ct state new accept

                # Accept pings
                icmp type { echo-request } ct state new accept

                # accept neighbor discovery otherwise IPv6 connectivity breaks.
                ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit,  nd-router-advert, nd-neighbor-advert } accept

                # count other traffic that does match the above that's dropped
                counter drop
        }
}
sudo nft -f /etc/nftables.conf
sudo systemctl start nftables.service

Add Unattended Updates

This an optional, but useful service.

apt install unattended-upgrades

sudo sed -i 's/\/\/\(.*origin=Debian.*\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies\) "false";/  \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Automatic-Reboot\) "false";/  \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades

Installation

Unbound

sudo apt install unbound

Pi-hole

sudo apt install curl
curl -sSL https://install.pi-hole.net | bash

Configuration

Unbound

The pi-hole guide for [unbound]:(https://docs.pi-hole.net/guides/dns/unbound/) includes a config block to copy and paste as directed. You should also add a config file for dnsmasq while you’re at it, to set EDNS packet sizes. (dnsmasq comes as part of pi-hole)

sudo vi /etc/dnsmasq.d/99-edns.conf
edns-packet-max=1232

When you check the status of unbound, you can ignore the warning: subnetcache:... as it’s just reminding you that data in the subnet cache (if you were to use it) can’t pre-fetched. There’s some conversation1 as to why it’s warning us.

The config includes prefetch, but you may also wish to add serve-expired to the same config file from above.

# serve old responses from cache while waiting for the actual resolution to finish.
serve-expired: yes
sudo systemctl restart unbound.service

No additional setup is needed, but see the unbound page for more info.

Pi-hole

Pi-hole can be configured via it’s two main config files, /etc/pihole/setupVars.cong and pihole-FTL.conf, but it’s convenient to use the GUI’s left-hand settings menu.

  • Settings -> DNS -> Upstream DNS Servers -> Custom 1 (Check and add 127.0.0.1#5335 as shown in the unbound guide linked above)
  • Settings -> DNS -> Interface settings -> Permit all origins (needed if you have multiple networks)

Very busy pi-hole installations generate lots of data and (seemingly) hang the dashboard. If that happens, limit the about of data being displayed.

vi /etc/pihole/pihole-FTL.conf 
# Don't import the existing DB into the GUI - it will hang the web page for a long time
DBIMPORT=no

# Don't import more than an hour of logs from the logfile
MAXLOGAGE=1

# Truncate data older than this many days to keep the size of the database down
MAXDBDAYS=1
sudo systemctl restart pihole-FTL.service

Operation

Local DNS Entries

You can enter local DNS and CNAME entries via the GUI, (Admin Panel -> Local DNS), but you can also edit the config file for bulk entries.

For A records

vim /etc/pihole/custom.list
10.50.85.2 test.some.lan
10.50.85.3 test2.some.lan

For CNAME records

vim /etc/dnsmasq.d/05-pihole-custom-cname.conf
cname=test3.some.lan,test.some.lan

Block Lists

Pi-hole ships with one ad list; StevenBlack. You may need to disable this for google or facebook search results to work as expected. The top search results are often ads and don’t work as expected when pi-hole is blocking them.

  • Admin Panel -> Ad Lists -> Status Column

You might consider adding security only lists instead, such as Intel’s below

Search the web for other examples.

Upgrading

Unbound will be upgraded via the Unattended Upgrades service. But pi-hole requires a manual command.

sudo pihole -up

Troubleshooting

DNS Cache Size

The default cache size of 10,000 serves thousands clients easily. This is because entries expire faster than the cache runs out. But you can check your evictions - cache entries removed to make room before they expire - to see.

settings -> System -> DNS cache evictions:

You’ll notice that insertions keep climbing as things are added to the cache, but the cache number itself represents only those entries that are current. If you do see evictions, edit CACHE_SIZE in /etc/pihole/setupVars.conf

You can also check this at the command line

dig +short chaos txt evictions.bind @localhost

   dig +short chaos txt cachesize.bind
   dig +short chaos txt hits.bind
   dig +short chaos txt misses.bind

However, we are advised that unused cache is wasted, when it could be used for disk buffers, etc. So don’t add it just in case.

Rate Limiting

The system has a default limit of 1000 queries in a 60 seconds window for each client. If your clients are proxied or relayed, you can run into this. This event is displayed in the dashbaord2 and also in the logs3.

sudo grep -i Rate-limiting /var/log/pihole/pihole.log /var/log/pihole/pihole.log

You may find the address 127.0.0.1 being rate limited. This can be due to pi-hole doing a reverse of all client IPs every hour. You can disable this with:

# In the pihole-FTL.conf
REFRESH_HOSTNAMES=NONE

DNS over HTTP

Firefox, if the user has not yet chosen a setting, will query use-application-dns.net. Pi-hole respods with NXDOMAIN4 as a signal to use pi-hole for DNS.

/etc/pihole/pihole-FTL.conf

Apple devices include a private relay5 that the user may decide to enable if they pay for it. Pi-hole by default blocks queries for mask.icloud.com and the user will be notified you are blocking it.

# Signal that Apple iCloud Private Relay is allowed 
BLOCK_ICLOUD_PR=false
sudo systemctl reload pihole-FTL.service

Searching The Query Log Hangs DNS

On a very busy server, clicking show-all in the query log panel will hang the server as pihole-FTL works through it’s database. There is no solution, just don’t do it. The best alternative is to ship logs to a Elasticsearch or similar system.

Ask Yourself

The system continues to use whatever DNS resolver was initially configured. You may want it to use itself, instead.

# revert if pi-hole itself needs fixed.
sudo vi /etc/resolv.conf

nameserver 127.0.0.1

1.1.2 - Pi-hole DHCP

Pi-hole serves up DHCP information as well as DNS, and can be configured and enabled in the GUI.

However, the GUI only allows for a single range. On a large network you’ll need multiple ranges. You do this by editing the config files directly.

Interface-Based Ranges

In this setup, you have a separate interface per LAN. Easy to do in a virtual or VLAN environment, but you’ll have to define each in the /etc/network/interfaces file.

Let’s create a range from 192.168.0.100-200 tied to eth0 and a range of 192.168.1.100-200 tied to eth1. We’ll also specify the router and two DNS servers.

vim /etc/dnsmasq.d/05-dhcp.conf
dhcp-range=eth0,192.168.0.100,192.168.0.200,24h
dhcp-option=ens161,option:router,192.168.0.1
dhcp-option=ens161,option:dns-server,192.168.0.2,192.168.0.3

dhcp-range=eth1,192.168.1.100,192.168.1.200,24h
dhcp-option=ens161,option:router,192.168.1.1
dhcp-option=ens161,option:dns-server,192.168.1.2,192.168.1.3

# Shared by both
dhcp-option=option:netmask,255.255.0.0

# Respond immediately without waiting for other servers 
dhcp-authoritative

# Don't try and ping the address before assigning it
no-ping

dhcp-lease-max=10000
dhcp-leasefile=/etc/pihole/dhcp.leases

domain=home.lan

These settings can be implicit - i.e. we could have left out ethX in the range, but explicit is often better for clarity.

Note - the DHCP server (dnsmasq) is not enabled by default. You can do that in the GUI under Settings –> DHCP

Relay-Based Ranges

In this setup, the router relays DHCP requests to the server. Only one system network interface is required, though you must configure the router(s).

When configured, the relay (router) sets the relay-agent (giaddr) field, sends it to dnsmasq, which (I think) understands it’s a relayed request when it sees that field, and looks at it’s available ranges for a match. It’s also sets a tag that to be used for assigning different options, such as the gateway, per range.

dhcp-range=tag0,192.168.0.100,192.168.0.250,255.255.255.0,8h
dhcp-range=tag1,192.168.1.100,192.168.1.250,255.255.255.0,8h
dhcp-range=tag2,192.168.2.100,192.168.2.250,255.255.255.0,8h

dhcp-options=tag0,3,192.168.0.1
dhcp-options=tag1,3,192.168.1.1
dhcp-options=tag2,3,192.168.2.1

Sources

https://discourse.pi-hole.net/t/more-than-one-conditional-forwarding-entry-in-the-gui/11359

Troubleshooting

It’s possible that the DHCP part of dnsmasq doesn’t scale to many thousands of leases1

1.1.3 - Unbound

1.2 - Email

Email is a commodity service, but critical for many things - so you can get it anywhere, but you better not mess it up.

Your options, in increasing order of complexity, are:

Forwarding

Email sent to [email protected] is simply forwarded to someplace like gmail. It’s free and easy, and you don’t need any infrastructure. Most registrars like GoDaddy, NameCheap, CloudFlare, etc, will handle it.

You can even reply from [email protected] by integrating with SendGrid or a similar provider.

Remote-Hosting

If you want more, Google and Microsoft have full productivity suites. Just edit your DNS records, import your users, and pay them $5 a head per month. You still have to ‘do email’ but it’s a little less work than if you ran the whole stack. In most cases, companies that specialize in email do it better than you can.

Self-Hosting

If you are considering local email, let me paraphrase Kenji López-Alt. The first step is, don’t. The big guys can do it cheaper and better. But if it’s a philosophical, control, or you just don’t have the funding, press on.

A Note About Cost

Most of the cost is user support. Hosting means someone else gets purchase and patch a server farm, but you still have to talk to users. My (anecdotal) observation is that fully hosting saves 10% in overall costs and it soothes out expenses. The more users you have, the more that 10% starts to matter.

1.2.1 - Forwarding

This is the best solution for a small number of users. You configure it at your registrar and rely on google (or someone similar) to do all the work for free.

If you want your out-bound emails to come from your domain name (and you do), add an out-bound relay. This is also free for minimal use.

Registrar Configuration

This is different per registrar, but normally involves creating an address and it’s destination

Cloudflare

  • (Login - assumes you use cloudflare as your registrar)
  • Login and select the domain in question.
  • Select Email, then Email Routing.
  • Under Routes, select Create address.

Once validated, email will begin arriving at the destination.

Configure Relaying

The registrars is only forwarding email, not sending it. To get your sent mail to from from your domain, you must integrate with a mail service such as SendGrid

SendGrid

  • Create a free account and login
  • Authenticate your domain name (via DNS)
  • Create an API key (Settings -> API Keys -> Restricted Access, Defaults)

Gmail

  • Settings -> Accounts -> Send Mail as
  • Add your domain email
  • Configure the SMTP server with:
    • SMTP server: “smtp.sendgrid.net”
    • username: “apikey”
    • password: (the key you created above)

After validating the code Gmail sends you, there will be a drop down in the From field of new emails.

1.2.2 - Remote Hosting

This is more in the software-as-a-service category. You get an admin dashboard and are responsible for managing users and mail flow. The hosting service provide will help you with basic things, but you’re doing most of the work yourself.

Having manged 100K+ user mail systems and migrated from on-prem sendmail to exchange and then O365 and Google, I can confidently say the infrastructure and even platform amounts to less than 10% of the cost of providing the service.

The main advantage to hosting is that you’re not managing the platform, installing patches and replacing hardware. The main disadvantage is is that you have little control and sometimes things are broken and you can’t do anything about it.

Medium sized organizations benefit most from hosting. You probably need a productivity suite anyways, and email is usually wrapped up in that. It saves you from having to specialize someone in email and the infrastructure associated with it.

But if controlling access to your data is paramount, then be aware that you have lost that and treat email as a public conversation.

1.2.3 - Self Hosting

When you self-host, you develop expertise in email itself, arguably a commodity service where such expertise has small return. But, you have full control and your data is your own.

The generally accepted best practice is install Postfix and Dovecot. This is the simplest path and what I cover here. But there are some pretty decent all-in-one packages such as Mailu, Modoboa, etc. These usually wrap Postfix and Dovecot to spare you the details and improve your quality of life, at the cost of not really knowing how they really work.

You’ll also need to configure a relay. Many ISPs block basic mail protocol and many recipient servers are rightly suspicious of random emails from unknown IPs in cable modem land.

  1. Postfix
  2. Dovecot
  3. Relay

1.2.3.1 - Postfix

This is the first step - a server that handles and stores email. You’ll be able to check messages locally at the console. (Remote client access such as with Thunderbird comes later.)

Preparation

You need:

  • Linux Server
  • Firewall Port-Forward
  • Public DNS

Server

We use Debian Bookworm (12) in this example but any derivative will be similar. At large scale you’d setup virtual users, but we’ll stick with the default setup and use your system account. Budget about 10M per 100 emails stored.

Port Forwarding

Mail protocol uses port 25. Simply forward that to your internal mail server and you’re done.

DNS

You need an normal ‘A’ record for your server and a special ‘MX’ record for your domain root. That way, mail sent to [email protected] will get routed to the server.

Name Type Value
the-server A 20.236.44.162
@ MX the-server

Mail servers see [email protected] and look for records of type ‘MX’ for ‘your.org’. Seeing that ’the-server’ is listed, they lookup it’s ‘A’ record and connect. A message to [email protected] is handled the same way, though when there is no ‘MX’ record it just delivers it to the ‘A’ record for ’the-server.your.org’. If you have both, the ‘MX’ takes precedence.

Installation

Some configuration is done at install time by the package so you must make sure your hostname is correct. We use the hostname ‘mail’ in this example.

# Correct internal hostnames as needed. 'mail' and 'mail.home.lan' are good suggestions.
cat /etc/hostname /etc/hosts

# Set the external host name and run the package installer. If postfix is already installed, apt remove it first
EXTERNAL="mail.your.org"
sudo debconf-set-selections <<< "postfix postfix/mailname string $EXTERNAL"
sudo debconf-set-selections <<< "postfix postfix/main_mailer_type string 'Internet Site'"
sudo apt install --assume-yes postfix

# Add the main domain to the destinations as well
DOMAIN="your.org"
sudo sed -i "s/^mydestination = \(.*\)/mydestination = $DOMAIN, \1/"  /etc/postfix/main.cf
sudo systemctl reload postfix.service

Test with telnet - use your unix system ID for the rcpt address below.

telnet localhost 25
ehlo localhost
mail from: <[email protected]>
rcpt to: <[email protected]>
data
Subject: Wish List

Red Ryder BB Gun
.
quit

Assuming that ‘you’ matches your shell account, Postfix will have accepted the message and used it’s Local Delivery Agent to store it in the local message store. That’s in /var/mail.

cat /var/mail/YOU 

Configuration

Encryption

Postfix will use the untrusted “snakeoil” that comes with debian to opportunistically encrypt communication between it and other mail servers. Surprisingly, most other servers will accept this cert (or fall back to non-encrypted), so lets proceed for now. We’ll generate a trusted one later.

Spam Protection

The default config is secured so that it won’t relay messages, but it will accept message from Santa, and is subject to backscatter and a few other things. Let’s tighten it up.

sudo tee -a /etc/postfix/main.cf << EOF

# Tighten up formatting
smtpd_helo_required = yes
disable_vrfy_command = yes
strict_rfc821_envelopes = yes

# Error codes instead of bounces
invalid_hostname_reject_code = 554
multi_recipient_bounce_reject_code = 554
non_fqdn_reject_code = 554
relay_domains_reject_code = 554
unknown_address_reject_code = 554
unknown_client_reject_code = 554
unknown_hostname_reject_code = 554
unknown_local_recipient_reject_code = 554
unknown_relay_recipient_reject_code = 554
unknown_virtual_alias_reject_code = 554
unknown_virtual_mailbox_reject_code = 554
unverified_recipient_reject_code = 554
unverified_sender_reject_code = 554
EOF

sudo systemctl reload postfix.service

PostFix has some recommendations as well.

sudo tee -a /etc/postfix/main.cf << EOF

# PostFix Suggestions
smtpd_helo_restrictions = reject_unknown_helo_hostname
smtpd_sender_restrictions = reject_unknown_sender_domain
smtpd_recipient_restrictions =
    permit_mynetworks, 
    permit_sasl_authenticated,
    reject_unauth_destination,
    reject_rbl_client zen.spamhaus.org,
    reject_rhsbl_reverse_client dbl.spamhaus.org,
    reject_rhsbl_helo dbl.spamhaus.org,
    reject_rhsbl_sender dbl.spamhaus.org
smtpd_relay_restrictions = 
    permit_mynetworks, 
    permit_sasl_authenticated,
    reject_unauth_destination
smtpd_data_restrictions = reject_unauth_pipelining
EOF

sudo systemctl reload postfix.service

If you test a message from Santa now, Postfix will do some checks and realize it’s bogus.

550 5.7.27 [email protected]: Sender address rejected: Domain northpole.org does not accept mail (nullMX)

Header Cleanup

Postfix will attach a Received: header to outgoing emails that has details of your internal network and mail client. That’s information you don’t need to broadcast. You can remove that with a “cleanup” step as the message is sent.

# Insert a header check after the 'cleanup' line in the smtp section of the master file and create a header_checks file
sudo sed -i '/^cleanup.*/a\\t-o header_checks=regexp:/etc/postfix/header_checks' /etc/postfix/master.cf
echo "/^Received:/ IGNORE" | sudo tee -a /etc/postfix/header_checks

Note - there is some debate on if this triggers a higher spam score. You may want to replace instead.

Testing

Incoming

You can now receive mail to [email protected] and [email protected]. Try this to make sure you’re getting messages. Feel free to install mutt if you’d like a better client at the console.

Outgoing

You usually can’t send mail and there are several reasons why.

Many ISPs block outgoing port 25 to keep a lid on spam bots. This prevents you from sending any messages. You can test that by trying to connect to gmail on port 25 from your server.

nc -zv gmail-smtp-in.l.google.com 25

Also, many mail servers will reverse-lookup your IP to see who it belongs to. That request will go to your ISP (who owns the IPs) and show their DNS name instead of yours. You’re often blocked at this step, though some providers will work with you if you contact them.

Even if you’re not blocked and your ISP has given you a static IP with a matching reverse-lookup, you will suffer from a lower reputation score as you’re not a well-known email provider. This can cause your sent messages to be delayed while being considered for spam.

To solve these issues, relay your email though a email provider. This will improve your reputation score (used to judge spam), ease the additional security layers such as SPF, DKIM, DMARC, and is usually free at small volume.

Postfix even calls this using a ‘Smarthost’

Next Steps

Now that you can get email, let’s make it so you can also send it.

Troubleshooting

When adding Postfix’s anti-spam suggestions, we left off the smtpd_client_restrictions and smtpd_end_of_data_restrictions as they created problems during testing.

You may get a warning from Postfix that one of the settings you’ve added is overriding one of the earlier settings. Simply delete the first instance. These are usually default settings that we’re overriding.

Use ‘@’ to view the logs from all the related services.

sudo journalctl -u [email protected]

If you change your server’s DNS entry, make sure to update mydestination in your /etc/postfix/main.cf and sudo systemctl reload [email protected].

Misc

Mail Addresses

Postfix only accepts messages for users in the “local recipient table” which is built from the unix password file and the aliases file1. You can add aliases for other addresses that will deliver to your shell account, but only shell users can receive mail right now. See virtual mailboxes to add users without shell accounts.

In the alias file, you’ll see “Postmaster” (and possibly others) are aliased to root. Add root as an alias to you at the bottom so that mail gets to your mailbox.

echo "root:   $USER" | sudo tee -a /etc/aliases
sudo newaliases

1.2.3.2 - Relay

A relay is simply another mail server that you give your outgoing mail to, rather than try to deliver it yourself.

There are many companies that specialize in this. Sign up for a free account and they give you the block of text to add to your postfix config. Some popular ones are:

  • SendGrid
  • MailGun
  • Sendinblue

They allow anywhere between 50 and 300 a day for free.

SendGrid

Relay Setup

SendGrid’s free plan gives you 50 emails a day. Create an account, verify your email address ([email protected]), and follow the instructions. Make sure to sudo apt install libsasl2-modules

https://docs.sendgrid.com/for-developers/sending-email/postfix

Restart Postfix and use mutt to send an email. It works! the only thing you’ll notice is that your message has a “On Behalf Of” notice in the message letting you know it came from SendGrid. Follow the section below to change that.

Domain Integration

To integrate your domain fully, add DNS records for SendGrid using these instructions.

https://docs.sendgrid.com/ui/account-and-settings/how-to-set-up-domain-authentication

This will require you to login and go to:

  • Settings -> Sender Authentication -> Domain Authentication

Stick with the defaults that include automatic security and SendGrid will give you three CNAME records. Add those to your DNS and your email will check out.

Technical Notes

DNS

If you’re familiar with email domain-based security, you’ll see that two of the records SendGrid gives you are links to DKIM keys so SendGrid can sign emails as you. The other record (emXXXX) is the host sendgrid will use to send email. The SPF record for that host will include a SendGrid SPF record that includes multiple pools of IPs so that SPF checks will pass. They use CNAMEs on your side so they can rotate keys and pool addresses without changing DNS entries.

If none of this makes sense to you, then that’s really the point. You don’t have to know any of it - they take care of it for you.

Next Steps

Your server can now send email too. All shell users on your sever rejoice!

To actually use your mail server, you’ll want to add some remote client access.

1.2.3.3 - Dovecot

Dovecot is an IMAP (Internet Message Access Protocol) server that allows remote clients to access their mail. There are other protocols and servers, but Dovecot has about 75% of the internet and is a good choice.

Installation

sudo apt install dovecot-imapd
sudo apt install dovecot-submissiond

Configuration

Storage

Both Postfix and Dovecot use mbox storage format by default. This is one big file with all your mail in it and doesn’t scale well. Switch to the newer maildir format where your messages are stored as individual files.

# Change where Postfix delivers mail.
sudo postconf -e "home_mailbox = Maildir/"
sudo systemctl reload postfix.service

# Change where Dovecot looks for mail.
sudo sed -i 's/^mail_location.*/mail_location = maildir:~\/Maildir/' /etc/dovecot/conf.d/10-mail.conf
sudo systemctl reload dovecot.service

Encryption

Dovecot comes with it’s own default cert. This isn’t trusted, but Thunderbird will prompt you and you can choose to accept it. This will be fine for now. We’ll generate a valid cert later.

Credentials

Dovecot checks passwords against the local unix system by default and no changes are needed.

Submissions

One potential surprise is that IMAP is only for viewing existing mail. To send mail, you use the SMTP protocol and relay messages to your mail server. But we have relaying turned off, as we don’t want just anyone relaying messages.

The solution is to enable authentication and by convention this is done by a separate port process, called the Submission Server.

We’ve installed Dovecot’s submission server as it’s newer and easier to set up. Postfix even suggests considering it, rather than theirs. The only configuration needed it to set the localhost as the relay.

# Set the relay as localhost where postfix runs
sudo sed -i 's/#submission_relay_host =/submission_relay_host = localhost/' /etc/dovecot/conf.d/20-submission.conf
sudo systemctl reload dovecot.service

Port Forwarding

Forward ports 143 and 587 to your mail server and test that you can connect from both inside and outside your LAN.

nc -zf mail.your.org 143
nc -zf mail.your.org 587

If it’s working from outside your network, but not inside, you may need to enable [reflection] aka hairpin NAT. This will be different per firewall vendor, but in OPNSense it’s:

Firewall -> Settings -> Advanced

 # Enable these settings
Reflection for port forwards
Reflection for 1:1
Automatic outbound NAT for Reflection

Clients

Thunderbird and others will successfully discover the correct ports and services when you provide your email address of [email protected].

Notes

TLS

Dovecot defaults to port 587 for the submission service which is an older standard for explicit TLS. It’s now recommended by RFC to use implicit TLS on port 465 and you can add a new “submissions” service for that, while leaving the default in place. Clients will pick their fav. Thunderbird defaults to the 465 when both are available.

Note: leaving the default sumbission port commented out just means it will use the default port. Comment out the whole block to disable.

vi /etc/dovecot/conf.d/10-master.conf

# Change the default of

service submission-login {
  inet_listener submission {
    #port = 587
  }
}

to 

service submission-login {
  inet_listener submission {
    #port = 587
  }
  inet_listener submissions {
    port = 465
    ssl = yes
  }
}

# And reload

sudo systemctl reload dovecot.service

Make sure to port forward 465 at the firewall as well

Next Steps

Now that you’ve got the basics working, let’s secure things a little more

Sources

https://dovecot.org/list/dovecot/2019-July/116661.html

1.2.3.4 - Security

Certificates

We should use valid certificates. The best way to do that is with the certbot utility.

Certbot

Certbot automates the process of getting and renewing certs, and only requires a brief connection to port 80 as proof it’s you. There’s also a DNS based approach, but we use the port method for simplicity. It only runs once every 60 days so there is little risk of exploit.

Forward Port 80

You probably already have a web server and can’t just change where port 80 goes. To integrate certbot, add a name-based virtual host proxy to that web server.

# Here is a caddy example. Add this block to your Caddyfile
http://mail.your.org {
        reverse_proxy * mail.internal.lan
}

# You can also use a well-known URL if you're already using that vhost
http://mail.your.org {
   handle /.well-known/acme-challenge/ {
     reverse_proxy mail.internal.lan
   }
 }

Install Certbot

Once the port forwarding is in place, you can install certbot and request a certificate. Note the --deploy-hook argument. This reloads services after a cert is obtained or renewed. Else, they’ll keep using an expired one.

DOMAIN=your.org

sudo apt install certbot
sudo certbot certonly --standalone --domains mail.$DOMAIN --non-interactive --agree-tos -m postmaster@$DOMAIN --deploy-hook "service postfix reload; service dovecot reload"

Postfix

Tell Postfix about the cert by using the postconf utility. This will warn you about any potential configuration errors.

sudo postconf -e 'smtpd_tls_cert_file = /etc/letsencrypt/live/mail.$DOMAIN/fullchain.pem'
sudo postconf -e 'smtpd_tls_key_file = /etc/letsencrypt/live/mail.$DOMAIN/privkey.pem'
sudo postfix reload

Dovecot

Change the Dovecot to use the cert as well.

sudo sed -i 's/^ssl_cert = .*/ssl_cert = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/fullchain.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo sed -i 's/^ssl_key = .*/ssl_key = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/privkey.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo dovecot reload

Verifying

You can view the certificates with the commands:

openssl s_client -connect mail.$DOMAIN:143 -starttls imap -servername mail.$DOMAIN
openssl s_client -starttls smtp -showcerts -connect mail.$DOMAIN:587 -servername mail.$DOMAIN

Privacy and Anti-Spam

You can take advantage of Cloudflare (or other) services to accept and inspect your email before forwarding it on to you. As far as the Internet is concerned, Cloudflare is your email server. The rest is private.

Take a look at the Forwarding section, and simply forward your mail to your own server instead of Google’s. That will even allow you to remove your mail server from DNS and drop connections other than CloudFlare if desired.

Intrusion Prevention

In my testing it takes less than an hour before someone discovers and attempts to break into your mail server. You may wish to GeoIP block or otherwise limit connections. You can also use crowdsec.

Crowdsec

Crowdsec is an open-source IPS that monitors your log files and blocks suspicious behavior.

Install as per their instructions.

curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install -y crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables
sudo cscli collections install crowdsecurity/postfix

Postfix

Most services now log to the system journal rather than a file. You can view them with the journalctl command

# What is the exact service unit name?
sudo systemctl status | grep postfix

# Anything having to do with that service unit
sudo journalctl --unit [email protected]

# Zooming into just the identifiers smtp and smtpd
sudo journalctl --unit [email protected] -t postfix/smtp -t postfix/smtpd

Crowdsec accesses the system journal by adding a block to it’s log acquisition directives.

sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
  - "[email protected]"
labels:
  type: syslog
---
EOF

sudo systemctl reload crowdsec

Dovecot

Install the dovecot collection as well.

sudo cscli collections install crowdsecurity/dovecot
sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
  - "_SYSTEMD_UNIT=dovecot.service"
labels:
  type: syslog
---
EOF

sudo systemctl reload crowdsec

Is it working? You won’t see anything at first unless you’re actively under attack. But after 24 hours you may see some examples of attempts to relay spam.

allen@mail:~$ sudo cscli alerts list
╭────┬────────────────────┬────────────────────────────┬─────────┬──────────────────────────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │       value        │           reason           │ country │                      as                      │ decisions │               created_at                │
├────┼────────────────────┼────────────────────────────┼─────────┼──────────────────────────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 60 │ Ip:187.188.233.58  │ crowdsecurity/postfix-spam │ MX      │ 17072 TOTAL PLAY TELECOMUNICACIONES SA DE CV │ ban:1     │ 2023-05-24 06:33:10.568681233 +0000 UTC │
│ 54 │ Ip:177.229.147.166 │ crowdsecurity/postfix-spam │ MX      │ 13999 Mega Cable, S.A. de C.V.               │ ban:1     │ 2023-05-23 20:17:49.912754687 +0000 UTC │
│ 53 │ Ip:177.229.154.70  │ crowdsecurity/postfix-spam │ MX      │ 13999 Mega Cable, S.A. de C.V.               │ ban:1     │ 2023-05-23 20:15:27.964240044 +0000 UTC │
│ 42 │ Ip:43.156.25.237   │ crowdsecurity/postfix-spam │ SG      │ 132203 Tencent Building, Kejizhongyi Avenue  │ ban:1     │ 2023-05-23 01:15:43.87577867 +0000 UTC  │
│ 12 │ Ip:167.248.133.186 │ crowdsecurity/postfix-spam │ US      │ 398722 CENSYS-ARIN-03                        │ ban:1     │ 2023-05-20 16:03:15.418409847 +0000 UTC │
╰────┴────────────────────┴────────────────────────────┴─────────┴──────────────────────────────────────────────┴───────────┴─────────────────────────────────────────╯

If you’d like to get into the details, take a look at the Crowdsec page .

Next Steps

Now that your server is secure, let’s take a look at how email is authenticated and how to ensure yours is.

1.2.3.5 - Authentication

Email authentication prevents forgery. People can still send unsolicited email, but they can’t fake who it’s from. If you set up a Relay for Postfix, the relayer is doing it for you. But otherwise, proceed onward to prevent your outgoing mail being flagged as spam.

You need three things

  • SPF: Server IP addresses - which specific servers have authorization to send email.
  • DKIM: Server Secrets - email is signed so you know it’s authentic and unchanged.
  • DMARC: Verifies the address in the From: aligns with the domain sending the email, and what to do if not.

SPF

SPF, or Sender Policy Framework, is the oldest component. It’s a DNS TXT record that lists the servers authorized to send email for a domain.

A receiving server looks at a messages’s return path (aka RFC5321.MailFrom header) to see what domain the email purports to be from. It then looks up that domain’s SPF record and if the server that sent the email isn’t included, the email is considered forged.

Note - this doesn’t check the From: header the user sees. Messages can appear (to the user) to be from anywhere. So it’s is mostly a low-level check to prevent spambots.

The DNS record for your Postfix server should look like:

Type: "TXT"
NAME: "@"
Value: "v=spf1 a:mail.your.org -all"

The value above shows the list of authorized servers (a:) contains mail.your.org. Mail from all other servers is considered forged (-all).

To have your Postfix server check SPF for incoming messages add the SPF policy agent.

sudo apt install postfix-policyd-spf-python

sudo tee -a /etc/postfix/master.cf << EOF

policyd-spf  unix  -       n       n       -       0       spawn
    user=policyd-spf argv=/usr/bin/policyd-spf
EOF

sudo tee -a /etc/postfix/main.cf << EOF

policyd-spf_time_limit = 3600
smtpd_recipient_restrictions =
   permit_mynetworks,
   permit_sasl_authenticated,
   reject_unauth_destination,
   check_policy_service unix:private/policyd-spf
EOF

sudo systemctl restart postfix

DKIM

DKIM, or DomainKeys Identified Mail, signs the emails as they are sent ensuring that the email body and From: header (the one you see in your client) hasn’t been changed in transit and is vouched for by the signer.

Receiving servers see the DKIM header that includes who signed it, then use DNS to check it. Unsigned mail simply isn’t checked. (There is no could-but-didn’t in the standard).

Note - There is no connection between the domain that signs the message and what the user sees in the From: header. Messages can have a valid DKIM signature and still appear to be from anywhere. DKIM is mostly to prevent man-in-the-middle attacks from altering the message.

For Postfix, this requires installation of OpenDKIM and a connection as detailed here. Make sure to sign with the domain root.

https://tecadmin.net/setup-dkim-with-postfix-on-ubuntu-debian/

Once you’ve done that, create the following DNS entry.

Type: "TXT"
NAME: "default._domainkey"
Value: "v=DKIM1; h=sha256; k=rsa; p=MIIBIjANBgkq..."

DMARC

Having a DMARC record is the final piece that instructs servers to check the From: header the user sees against the domain return path from the SPF and DKIM checks, and what to do on a fail.

This means mail “From: [email protected]” sent though mail.your.org mail servers will be flagged as spam.

The DNS record should look like:

Type: "TXT"
NAME: "_dmarc"
Value: "v=DMARC1; p=reject; adkim=s; aspf=r;"
  • p=reject: Reject messages that fail
  • adkim=s: Use strict DKIM alignment
  • aspf=r: Use relaxed SPF alignment

Reject (p=reject) indicates that email servers should “reject” emails that fail DKIM or SPF tests, and skip quarantine.

Strict DKIM alignment (=s) means that the SPF Return-Path domain or the DKIM signing domain must be an exact match with the domain in the From: address. A DKIM signature from your.org would exactly match [email protected].

Relaxed SPF alignment (=r) means subdomains of the From: address are acceptable. I.e. the server mail.your.org from the SPF test aligns with an email from: [email protected].

You can also choose quarantine mode (p=quarantine) or report-only mode (p=none) where the email will be accepted and handled as such by the receiving server, and a report sent to you like below.

v=DMARC1; p=none; rua=mailto:[email protected]

DMARC is an or test. In the first example, if either the SPF or DKIM domains pass, then DMARC passes. You can choose to test one, both or none at all (meaning nothing can pass DMARC) as the the second DMARC example.

To implement DMARC checking in Postfix, you can install OpenDMARC and configure a mail filter as described below.

https://www.linuxbabe.com/mail-server/opendmarc-postfix-ubuntu

Next Steps

Now that you are hadnling email securely and authentically, let’s help ease client connections

Autodiscovery

1.2.3.6 - Autodiscovery

In most cases you don’t need this. Thunderbird, for example, will use a shotgun approach and may find your sever using ‘common’ server names based on your email address.

But there is an RFC and other clients may need help.

DNS SRV

This takes advantage of the RFC with an entry for IMAP and SMTP Submission

Type Name Service Protocol TTL Priority Weight Port Target
SRV @ _imap TCP auto 10 5 143 mail.your.org
SRV @ _submission TCP auto 10 5 465 mail.your.org

Web Autoconfig

  • Create a DNS entry for autoconfig.your.org
  • Create a vhost and web root for that with the file mail/config-v1.1.xml
  • Add the contents below to that file
<?xml version="1.0"?>
<clientConfig version="1.1">
    <emailProvider id="your.org">
      <domain>your.org</domain>
      <displayName>Example Mail</displayName>
      <displayShortName>Example</displayShortName>
      <incomingServer type="imap">
         <hostname>mail.your.org</hostname>
         <port>143</port>
         <socketType>STARTTLS</socketType>
         <username>%EMAILLOCALPART%</username>
         <authentication>password-cleartext</authentication>
      </incomingServer>
      <outgoingServer type="smtp">
         <hostname>mail.your.org</hostname>
         <port>587</port>
         <socketType>STARTTLS</socketType> 
         <username>%EMAILLOCALPART%</username> 
         <authentication>password-cleartext</authentication>
         <addThisServer>true</addThisServer>
      </outgoingServer>
    </emailProvider>
    <clientConfigUpdate url="https://www.your.org/config/mozilla.xml" />
</clientConfig>

Note

It’s traditional to match server names to protocols and we would have used “imap.your.org” and “smtp.your.org”. But using ‘mail’ is popular now and it simplifies setup at several levels.

Thunderbird will try to guess at your server names, attempting to connect to smtp.your.org for example. But many Postfix configurations have spam prevention that interfere.

Sources

https://cweiske.de/tagebuch/claws-mail-autoconfig.htm
https://www.hardill.me.uk/wordpress/2021/01/24/email-autoconfiguration/

1.3 - Web

1.3.1 - Content Mgmt

There are many ways to manage and produce web content. Traditionally, you’d use a large application with roles and permissions.

A more modern approach is to use a distributed version control system, like git, and a site generator.

Static Site Generators are gaining popularity as they produce static HTML with javascript and CSS that can be deployed to any Content Delivery Network without need for server-side processing.

Astro is great, as is Hugo, with the latter being around longer and having more resources.

1.3.1.1 - Hugo

Hugo is a Static Site Generator (SSG) that turns Markdown files into static web pages that can be deployed anywhere.

Like WordPress, you apply a ’theme’ to style your content. But rather than use a web-inteface to create content, you directly edit the content in markdown files. This lends itself well tomanaging the content as code and appeals to those who prefer editing text.

However, unlike other SSGs, you don’t have to be a front-end developer to get great results and you can jump in with a minimal investment of time.

1.3.1.1.1 - Hugo Install

Requirements

I use Debian in this example, but any apt-based distro will be similar.

Preparation

Enable and pin the Debian Backports and Testing repos so you can get recent versions of Hugo and needed tools.

–> Enable and Pin

Installation

Hugo requires git and go

# Assuming you have enable backports as per above
sudo apt install -t bullseye-backports git
sudo apt install -t bullseye-backports golang-go

For a recent version of Hugo you’ll need to go to the testing repo. The extended version is recommended by Hugo and it’s chosen by default.

# This pulls in a number of other required packages, so take a close look at the messages for any conflicts. It's normally fine, though. 
sudo apt install -t testing  hugo

In some cases, you can just install from the Debian package with a lot less effort. Take a look at latest and copy the URL into a wget.

https://github.com/gohugoio/hugo/releases/latest

wget https://github.com/gohugoio/hugo/releases/download/v0.124.1/hugo_extended_0.124.1_linux-amd64.deb

Configuration

A quick test right from the quickstart page to make sure everything works

hugo new site quickstart
cd quickstart
git init
git submodule add https://github.com/theNewDynamic/gohugo-theme-ananke themes/ananke
echo "theme = 'ananke'" >> config.toml
hugo server

Open up a browser to http://localhost:1313/ and you you’ll see the default ananke-themed site.

Next Steps

The ananke theme you just deployed is nice, but a much better theme is Docsy. Go give that a try.

–> Deploy Docsy on Hugo

1.3.1.1.2 - Docsy Install

Docsy is a good-looking Hugo theme that provides a landing page, blog, and a documentation sub-sites using bootstrap CSS.

The documentation site in particular let’s you turn a directory of text files into a documentation tree with relative ease. It even has a collapsible left nav bar. That is harder to find than you’d think.

Preparation

Docsy requires Hugo. Install that if you haven’t already. It also needs a few other things; postcss, postcss-cli, and autoprefixer from the Node.JS ecosystem. These should be installed in the project directory as version requirements change per theme.

mkdir some.site.org
cd some.site.org
sudo apt install -t testing nodejs npm
npm install -D autoprefixer 
npm install -D postcss
npm install -D postcss-cli

Installation

Deploy Docsy as a Hugo module and pull in the example site so we have a skeleton to work with. We’re using git, but we’ll keep it local for now.

git clone https://github.com/google/docsy-example.git .
hugo server

Browse to http://localhost:1313 and you should see the demo “Goldydocs” site.

Now you can proceed to configure Docsy!

Updating

The Docsy theme gets regular updates. To incorporate those you only have to run this command. Do this now, actually, to get any theme updates the example site hasn’t incoporated yet.

cd /path/to/my-existing-site
hugo mod get -u github.com/google/docsy

Troubleshooting

hugo

Error: Error building site: POSTCSS: failed to transform “scss/main.css” (text/css)>: Error: Loading PostCSS Plugin failed: Cannot find module ‘autoprefixer’

And then when you try to install the missing module

The following packages have unmet dependencies: nodejs : Conflicts: npm npm : Depends: node-cacache but it is not going to be installed

You may have already have installed Node.JS. Skip trying to install it from the OS’s repo and see if npm works. Then proceed with postcss install and such.

1.3.1.1.3 - Docsy Config

Let’s change the basics of the site in the config.toml file. I put some quick sed commands here, but you can edit by hand as well. Of note is the Github integration. We prepoulate it here for future use, as it allows quick edits in your browser down the road.

SITE=some.site.org
GITHUBID=someUserID
sed -i "s/Goldydocs/$SITE/" config.toml
sed -i "s/The Docsy Authors/$SITE/" config.toml
sed -i "s/example.com/$SITE/" config.toml
sed -i "s/example.org/$SITE/" config.toml
sed -i "s/google\/docsy-example/$GITHUBID\/$SITE/" config.toml 
sed -i "s/USERNAME\/REPOSITORY/$GITHUBID\/$SITE/" config.toml 
sed -i "s/https:\/\/policies.google.com//" config.toml
sed -i "s/https:\/\/github.com\/google\/docsy/https:\/\/github.com\/$GITHUBID/" config.toml
sed -i "s/github_branch/#github_branch/" config.toml

If you don’t plan to translate your site into different languages, you can dispense with some of the extra languages as well.

# Delete the 20 or so lines starting at "lLanguage] and stopping at the "[markup]" section,
# including the english section.
vi config.tml

# Delete the folders from 'content/' as well, leaving 'en'
rm -rf content/fa content/no

You should also set a default meta description or the engine will put in in the bootstrap default and google will summarize all your pages with that

vi config.toml
[params]
copyright = "some.site.org"
privacy_policy = "/privacy"
description = "My personal website to document what I know and how I did it"

Keep and eye on the site in your browser as you make changes. When you’re ready to start with the main part of adding content, take a look at the next section.

Docsy Operation

Notes

You can’t dispense with the en folder yet, as it breaks some github linking functionality you may want to take advantage of later

1.3.1.1.4 - Docsy Operate

This is a quick excerpt from the Docsy Content and Customization docs. Definitely spend time with those after reading the overview here.

Directory Layout

Content is, appropriately enough, in the content directory, and it’s subdirectories line up with the top-level navigation bar of the web site. About, Documentation, etc corresponds to content/about, content/docs and so on.

The directories and files you create will be the URL that you get with one important exception, filenames are converted to a ‘slug’, mimicking how index files work. For example, If you create the file docs/tech/mastadon.md the URL will be /docs/tech/mastadon/. This is for SEO (Search Engine Optimization).

The other thing you’ll see are _index.html files. In the example above, the URL /docs/tech/ has no content, as it’s a folder. But you can add a _index.md or .html to give it some. Avoid creating index.md or tech.md (a file that matches the name of a subdirectory). Either of those will block Hugo from generating content for any subdirectories.

The Landing Page and Top Nav Pages

The landing page itself is the content/_index.html file and the background is featured-background.jpg. The other top-nav pages are in the content folders with _index files. You may notice the special header variable “menu: main: weight: " and that is what flags that specific page as worth of being in the top menu. Removing that, or adding that (and a linkTitle) will change the top nav.

The Documentation Page and Left Nav Bar

One of the most important features of the Docsy template is the well designed documentation section that features a Section menu, or left nav bar. This menu is built automatically from the files you put in the docs folder, as long as you give them a title. (See Front Matter, below). They are ordered by date but you can add a weight to change that.

It doesn’t collapse by default and if you have a lot of files, you’ll want to enable that.

# Search and set in your config.toml
sidebar_menu_compact = true

Front Matter

The example files have a section at the top like this. It’s not strictly required, but you must have at least the title or they won’t show up in the left nav tree.

---
title: "Examples"
---

Page Content and Short Codes

In addition to normal markdown or html, you’ll see frequent use of ‘shortcodes’ that do things that normal markdown can’t. These are built in to Hugo and can be added by themes, and look like this;

{{% blocks/lead color="dark" %}}
Some Important Text
{{% /blocks/lead %}}

Diagrams

Docsy supports mermaid and a few other tools for creating illustrations from code, such as KaTeX, Mermaid, Diagrams.net, PlantUML, and MarkMap. Simply use a codeblock.

```mermaid
graph LR
 one --> two
```

Generate the Website

Once you’re satisfied with what you’ve got, tell hugo to generate the static files and it will populate the folder we configured earlier

hugo

Publish the Web Site

Everything you need is in the public folder and all you need do is copy it to a web server. You can even use git, which I advise since we’re already using it to pull in and update the module.

–> Local Git Deployment

Bonus Points

If you have a large directory structure full of markdown files already, you can kick-start the process of adding frontmatter like this;

find . -type f | \
while read X
do
  TITLE=$(basename ${X%.*})
  FRONTMATTER=$(printf -- "---\ntitle = ${TITLE}\n---")
  sed -i "1s/^/$FRONTMATTER\n/" "$X"
done

1.3.1.1.5 - Docsy Github

You may have noticed the links on the right like “Edit this page” that takes one to Github. Let’s set those up.

On Github

Go to github and create a new repository. Use the name of your side for the repo name, such as “some.site.org”. If you want to use something else, you can edit your config.toml file to adjust.

Locally

You man have noticed that Github suggested some next steps with a remote add using the name “origin”. Docsy is already using that, however, from when you cloned it. So we’ll have to pick a new name.

cd /path/to/my-existing-site
git remote add github https://github.com/yourID/some.site.org

Let’s change our default banch to “main” to match Github’s defaults.

git branch -m main

Now we can add, commit and push it up to Github

git add --all
git commit -m "first commit of new site"
git push github

You’ll notice something interesting when you go back to look at Github; all the contributers on the right. That’s because you’re dealing with a clone of Docsy and you can still pull in updates and changes from original project.

It may have been better to clone it via github

1.3.2 - Content Deployment

Automating deployment as part of a general continuous integration strategy is best-practice these days. Web content should be similarly treated.

I.e. version controlled and deployed with git.

1.3.2.1 - Local Git Deployment

Overview

Let’s create a two-tiered system that goes from dev to prod using a post-commit trigger

graph LR
Development --git / rsync---> Production

The Development system is your workstation. git commit will trigger a build and rsync.

The Production system is a web server. Any web server will do as long as you have SSH access and can update a web-root folder.

I use Hugo in this example, but any system that has an output (or build) folder works similarly.

Configuration

The first thing we need to know is where wee are going, so lets prepare production first.

Production System

This server probably uses folders like /var/www/XXXXX for its web root. Use that or create a new folder and make yourself the owner.

sudo mkdir /var/www/some.site.org
sudo chown -R $USER /var/www/some.site.org
echo "Hello" > /var/www/some.site.org/index.html

Edit your web server’s config to make sure you can view that web page. Also check that rsync is available from the command line.

Development System

Hugo builds static html in a public directory. To generate the HTML, simply type hugo

cd /path/to/my-existing-site
hugo
ls public

We don’t actually want this folder in git and most themes (if you’re using Hugo) already exclude it. Look for a .gitignore file to and create/add if needed.

# Notice /public is at the top of the git ignore file
cat .gitignore

/public
package-lock.json
.hugo_build.lock
...

Assuming you have some content, let’s add and commit it.

git add --all
git commit -m "Initial Commit"

Note: All of these git commands work because pulling in a theme initialized the directory. If you’re doing something else you’ll need to git init.

The last step is to create a hook that will build and deploy after a commit.

cd /path/to/my-existing-site
touch .git/hooks/post-commit
chmod +x .git/hooks/post-commit
vi .git/hooks/post-commit
#!/bin/sh
hugo --cleanDestinationDir
rsync --recursive --delete public/ [email protected]:/var/www/some.site.org

This script ensures that the remote directory matches your local directory. When you’re ready to update the remote site:

git add --all
git commit --allow-empty -m "trigger update"

If you mess up the production files, you can just call the hook manually.

cd /path/to/my-existing-site
touch .git/hooks/post-commit

Troubleshooting

bash: line 1: rsync: command not found

Double check that the remote host has rsync.

1.3.3 - Content Delivery

1.3.3.1 - Cloudflare

  • Cloudflare acts as a reverse proxy to hide your server’s IP address
  • Takes over your DNS and directs requests to the closest site
  • Injects JavaScript analytics
    • If the browser’s “do not track” is on, JS isn’t injected.
  • Can uses a tunnel and remove encryption overhead

1.3.4 - Servers

1.3.4.1 - Caddy

Caddy is a web server that runs SSL by default by automatically grabing a cert from Let’s Encrypt. It comes as a stand-alone binary, written in Go, and makes a decent reverse proxy.

1.3.4.1.1 - Installation

Installation

Caddy recommends “using our official package for your distro” and for debian flavors they include the basic instructions you’d expect.

Configuration

The easiest way to configure Caddy is by editing the Caddyfile

sudo vi /etc/caddy/Caddyfile
sudo systemctl reload caddy.service

Sites

You define websites with a block that includes a root and the file_server directive. Once you reload, and assuming you already have the DNS in place, Caddy will reach out to Let’s Encrypt, acquire a certificate, and automatically forward from port 80 to 443

site.your.org {        
    root * /var/www/site.your.org
    file_server
}

Authentication

You can add basicauth to a site by creating a hash and adding a directive to the site.

caddy hash-password
site.your.org {        
    root * /var/www/site.your.org
    file_server
    basicauth { 
        allen SomeBigLongStringFromTheCaddyHashPasswordCommand
    }
}

Reverse Proxy

Caddy also makes a decent reverse proxy.

site.your.org {        
    reverse_proxy * http://some.server.lan:8080
}

You can also take advantage of path-based reverse proxy. Note the rewrite to accommodate the trailing-slash potentially missing.

site.your.org {
    rewrite /audiobooks /audiobooks/
    handle_path /audiobooks/* {
        uri strip_prefix /audiobooks/
        reverse_proxy * http://some.server.lan:8080
    }
}

Include Blocks

You can define common elements at the top and include them on multiple sites. This helps when you have many sites.

(logging) {
    log {
        output file /var/log/caddy/access.log
    }
}
site.your.org {
    import logging     
    reverse_proxy * http://some.server.lan:8080
}

Modules

Caddy is a single binary so when adding a new module (aka feature) you are essentially downloading a new version that has them compiled in. You can find the list of packages at their download page.

Do this at the command line with caddy itself.

sudo caddy add-package github.com/mholt/caddy-webdav
systemctl restart caddy

Security

Drop Unknown Domains

Caddy will accept connections to port 80, announce that it’s a Caddy web server and redirect you to https before realizing it doesn’t have a site or cert for you. Configure this directive at the bottom so it drops immediately.

http:// {
    abort
}

Crowdsec

Caddy runs as it’s own user and is fairly memory-safe. But installing Crowdsec helps identify some types of intrusion attempts.

[TODO]

Troubleshooting

You can test your config file and look at the logs like so

caddy validate --config /etc/caddy/Caddyfile
journalctl --no-pager -u caddy

1.3.4.1.2 - WebDAV

Caddy can also serve WebDAV requests with the appropriate module. This is important because for many clients, such as Kodi, WebDAV is significantly faster.

sudo caddy add-package github.com/mholt/caddy-webdav
sudo systemctl restart caddy
{   # Custom modules require order of precedence be defined
    order webdav last
}
site.your.org {
    root * /var/www/site.your.org
    webdav * 
}

You can combine WebDAV and Directly Listing - highly recommended - so you can browse the directory contents with a normal web browser as well. Since WebDAV doesn’t use the GET method, you can use the @get filter to route those to the file_server module so it can serve up indexes via the browse argument.

site.your.org {
    @get method GET
    root * /var/www/site.your.org
    webdav *
    file_server @get browse        
}

Sources

https://github.com/mholt/caddy-webdav https://marko.euptera.com/posts/caddy-webdav.html

1.3.4.1.3 - MFA

The package caddy-security offers a suite of auth functions. Among these is MFA and a portal for end-user management of tokens.

Installation

# Install a version of caddy with the security module 
sudo caddy add-package github.com/greenpau/caddy-security
sudo systemctl restart caddy

Configuration

/var/lib/caddy/.local/caddy/users.json

caddy hash-password

Troubleshooting

journalctl –no-pager -u caddy

1.3.4.1.4 - Wildcard DNS

Caddy has an individual cert for every virtual host you create. This is fine, but Let’s Encrypt publishes these as part of certificate transparency and the bad guys are watching. If you create a new site in caddy, you’ll see bots probing for weaknesses within 30 min - without you even having published the URL. There’s no security in anonymity, but the need-to-know principle suggests we shouldn’t be informing the whole world about sites of limited scope.

One solution is a wildcard cert. It’s published as just ‘*.some.org’ so there’s no information disclosed. Caddy supports this, but it requires a little extra work.

Installation

In this example we have already installed caddy and use cloudflare as a hosted DNS provider. Check https://github.com/caddy-dns to see if your DNS provider is available.

# Divert the default binary from the repo
sudo dpkg-divert --divert /usr/bin/caddy.default --rename /usr/bin/caddy
sudo cp /usr/bin/caddy.default /usr/bin/caddy.custom
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.default 10
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.custom 50

# Add the package and restart. 
sudo caddy add-package github.com/caddy-dns/cloudflare
sudo systemctl restart caddy.service    

From here on out, apt update will not upgrade caddy. You must enter caddy upgrade at the command line. The devs don’t think this should be an issue.

DNS Provider Configuration

For Cloudflare, a decent example is below. Just use the ‘Getting the Cloudflare API Token’ part

https://roelofjanelsinga.com/articles/using-caddy-ssl-with-cloudflare/

Caddy Configuration

Use the acme-dns global option and then create a single site (used to determine the cert) and match the actual vhosts with subsites.

{
    acme_dns cloudflare alotcharactersandnumbershere
}

*.some.org, some.org {

    @site1 host site1.some.org
    handle @site1 {
        reverse_proxy * http://localhost:3200
    }

    @site2 host site2.some.org
    handle @site2 {
        root * /srv/www/site2
    }
}

2 - Media

2.1 - Players

2.1.1 - LibreELEC

One of the best systems for a handling media is LibreELEC. It’s both a Kodi box and a server appliance that’s resistant to abuse. With the right hardware (like a ROCKPro64 or Waveshare) it also makes an excellent portable server for traveling.

Deployment

Download an image from https://libreelec.tv/downloads and flash as directed. Enable SSH during the initial setup.

Storage

RAID is a useful feature but only BTRFS works directly. This is fine, but with a little extra work you can add MergerFS, a popular option for combining disks.

BTRFS

Create the RAID set on another PC. If your disks are of different sizes you can use the ‘single’ profile, but leave the metadata mirrored.

sudo mkfs.btrfs -f -L pool -d single -m raid1 /dev/sda /dev/sdb /dev/etc...

After attaching to LibreELEC, the array will be automatically mounted at /media/pool based on label pool you specified above.

MergerFS

This is a good option if you just want to combine disks and unlike most other RAID technologies, if you loose a disk the rest will keep going. Many people combine this with SnapRAID for off-line parity.

But it’s a bit more work.

Cooling

You may want to manage the fan. The RockPro64 has a PWM fan header and LibreELEC loads the pwm_fan module.

Kodi Manual Start

The kodi process can use a significant amount of CPU even at rest. If you’re using this primarily as a file server you can disable kodi from starting automatically.

cp /usr/lib/systemd/system/kodi.service /storage/.config/system.d/kodi-alt.service
systemctl mask kodi

To start kodi, you can enter systemctl start kodi-alt

Remotes

Plug in a cheap Fm4 style remote and it ‘just works’ with kodi. But if you want to customize some remote buttons, say to start kodi manually, you still can.

Enable SMB

To share your media, simply copy the sample file, remove all the preconfigured shares (unless you want them), and add one for your storage pool. Then just enable Samba and reboot (so the file is picked up)

cp /storage/.config/samba.conf.sample /storage/.config/samba.conf
vi /storage/.config/samba.conf
[media]
  path = /storage/pool
  available = yes
  browseable = yes
  public = yes
  writeable = yes
Config --> LibreELEC --> Services --> Enable Samba

Enable HotSpot

Config --> LibreELEC --> Network --> Wireless Networks

Enable Active and Wireless Access Point and it just works!

Enable Docker

This is a good way handle things like Jellyfin or Plex if you must. In the GUI, go to add-ons, search for the items below and install.

  • docker
  • LinuxServer.io
  • Docker Image Updater

Then you must make sure the docker starts starts after the storage is up or the containers will see an empty folder instead of a mounted one.

vi /storage/.config/system.d/service.system.docker.service
[Unit]
...
...
After=network.target storage-pool.mount

If that fails, you can also tell docker to wait a bit

ExecStartPre=/usr/bin/sleep 120

Remote Management

You may be called upon to look at something remotely. Sadly, there’s no remote access to the GUI but you can use things like autossh to create a persistent remote tunnel, or wireguard to create a VPN connection. Wireguard is usually better.

2.1.1.1 - Add-ons

You can also use this platform as a server. This seems counter-intuitive at first; to use a media player OS as a server. But in practice it is rock-solid. I have a mixed fleet of 10 or so devices and LibreELEC has better uptime stats than TrueNAS.

The device playing content on your TV is also the media server for the rest of the house. I wouldn’t advertise this as an enterprise solution, but I can’t dispute the results.

Installation

Normal Add-ons

Common tools like rsync, as well as server software like Jellyfin are available. You can browse as descriped below, or use the search tool if you’re looking for something specific.

  • Select the gear icon and choose Add-ons
  • Choose LibreELEC Add-ons
  • Drill down to browse software.

Docker

If you’re on ARM or want more frequent updates, you may want to add Docker and the LinuxServer.io repository.

  • Select the gear icon and choose Add-ons
  • Search add-ons for “Docker” and install
  • Search add-ons for “LinuxServer.io” and install
  • Select “Install from repository” and choose “LinuxServer.io’s Docker Add-ons”.

Drill down and add Jellyfin, for example.

https://wiki.libreelec.tv/installation/docker

2.1.1.2 - AutoSSH

This allows you to setup and monitor a remote tunnel as the easiest wat to manage remote clients is to let them come to you. To accomplish this, we’ll set up a server, create client keys, test a reverse tunnel, and setup autossh.

The Server

This is simply a server somewhere that everyone can reach via SSH. Create a normal user account with a password and home directory, such as with adduser remote. We will be connecting from our clients for initial setup with this.

The Client

Use SSH to connect to the LibreELEC client, generate a ssh key pair and copy it to the remote server

ssh [email protected]
ssh-keygen  -f ~/.ssh/id_rsa -q -P ""

# ssh-copy-id isn't available so you must use the rather harder command below
cat ~/.ssh/id_rsa.pub | ssh -t [email protected] "cat - >> ~/.ssh/authorized_keys"

ssh [email protected]

If all went well you can back out and then test logging in with no password. Make sure to do this and accept the key so th

The Reverse Tunnel

SSH normally connects your terminal to a remote server. Think of this as a encrypted tunnel where your keystrokes are sent to the server and it’s responses are sent back to you. You can send more than your keystrokes, however. You can take any port on your system and send it as well In our case, we’ll take port 22 (where ssh just happens to be listening) and send it to the rendezvous server on port 2222. SSH will continue to accept local connections while also taking connections from the remote port we are tunneling in.

# On the client, issue this command to connect the (-R)remote port 2222 to localhost:22, i.e. the ssh server on the client
ssh -N -R 2222:localhost:22 -o ServerAliveInterval=240 -o ServerAliveCountMax=2 [email protected]

# Leave that running while you login to the rendezvois server and test if you can now ssh to the client by connecting to the forwarded port.

ssh [email protected]
ssh root@localhost -p 2222

# Now exit both and set up Autossh below

Autossh

Autossh is a daemon that monitors ssh sessions to make sure they’re up and operational, restarting them as needed, and this is exactly what we need to make sure the ssh session from the client stays up. To run this as a service, a systemd service file is needed. For LibreELEC, these are in /storage/.config.

vi /storage/.config/system.d/autossh.service

[Unit]
Description=autossh
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
EnvironmentFile=/storage/.config/autossh
ExecStart=/storage/.kodi/addons/virtual.system-tools/bin/autossh $SSH_OPTIONS
Restart=always
RestartSec=60

[Install]
WantedBy=multi-user.target
vi /storage/.config/autossh

AUTOSSH_POLL=60
AUTOSSH_FIRST_POLL=30
AUTOSSH_GATETIME=0
AUTOSSH_PORT=22034
SSH_OPTIONS="-N -R 2222:localhost:22 [email protected] -i /storage/.ssh/id_rsa"
systemctl enable autossh.service
systemctl start autossh.service
systemctl status autossh.service

At this point, the client has a SSH connection to your server on port 22, opened port 2222 the ssh server and forwarded that back to it’s own ssh server. You can now connect by:

ssh [email protected]
ssh root@localhost -p 2222

If not, check the logs for errors and try again.

journalctl -b 0 --no-pager | less

Remote Control

Now that you have the client connected, you can use your Rendezvous Server as a Jump Host to access things on the remote client such as it’s web interface and even the console via VNC. Your connection will generally take the form of:

ssh localport:libreelec:libreelec_port -J rendezvoisServer  redevoisServer -p autosshPort

The actual command is hard to read as are going through the rendezvois server twice and connecting to localhost on the destination.

ssh -L 8080:localhost:32400  -J [email protected] root@localhost -p 2222

2.1.1.3 - Building

This works best in an Ubuntu container.

LibreELECT Notes

Installed but no sata hdd. Found this

RPi4 has zero support for PCIe devices so why is it “embarrasing” for LE to omit support for PCIe SATA things in our RPi4 image?

Feel free to send a pull-request to GitHub enabling the kernel config that’s needed.

https://forum.libreelec.tv/thread/27849-sata-controller-error/

Went though thier resouces beginners guid to git https://wiki.libreelec.tv/development/git-tutorial#forking-and-cloning building basics https://wiki.libreelec.tv/development/build-basics specific build commands https://wiki.libreelec.tv/development/build-commands/build-commands-le-12.0.x

and then failed because jammy wasn’t compatibile enough

Created a jammy container and restarted

https://ubuntu.com/server/docs/lxc-containers

sudo lxc-create –template download –name u1 ubuntu jammy amd64 sudo lxc-start –name u1 –daemon sudo lxc-attach u1

Used some of the notes from

https://www.artembutusov.com/libreelec-raid-support/

Did as fork, clone and a

git fetch –all

but couldnt get all the downloads as alsa.org site was down

On a side note, these are needed in the config.txt so that USB works

otg_mode=1,dtoverlay=dwc2,dr_mode=host

https://www.jeffgeerling.com/blog/2020/usb-20-ports-not-working-on-compute-module-4-check-your-overlays

I tried a menuconfig and selected ..sata? and got

CONFIG_ATA=m < CONFIG_ATA_VERBOSE_ERROR=y < CONFIG_ATA_FORCE=y CONFIG_ATA_SFF=y CONFIG_ATA_BMDMA=y

Better compare the .config file again

Edited and commited a config.txt but it didn’t show up in the image. Possibly the wrong file or theres another way to realize that chagne

Enabled the SPI interface

https://raspberrypi.stackexchange.com/questions/48228/how-to-enable-spi-on-raspberry-pi-3 https://wiki.libreelec.tv/configuration/config_txt

sudo apt install lxc

# This didn't work for some reason
sudo lxc-create --template download --name u1 --dist ubuntu --release jammy --arch amd64

sudo lxc-create --template download --name u1

sudo lxc-start --name u1 --daemon

sudo lxc-attach  u1

# Now inside, build 
apt update
apt upgrade
apt-get install gcc make git wget
apt-get install bc patchutils bzip2 gawk gperf zip unzip lzop g++ default-jre u-boot-tools texinfo xfonts-utils xsltproc libncurses5-dev xz-utils


# login and fork so you can clone more easily. Some problem with the creds

cd
git clone https://github.com/agattis/LibreELEC.tv
cd LibreELEC.tv/
git fetch --all
git tag
git remote add upstream https://github.com/LibreELEC/LibreELEC.tv.git
git fetch --all
git checkout libreelec-12.0
git checkout -b CM4-AHCI-Add
PROJECT=RPi ARCH=aarch64 DEVICE=RPi4   tools/download-tool
ls
cat /etc/passwd 
pwd
ls /home/
ls /home/ubuntu/
ls
cd ..
mv LibreELEC.tv/ /home/ubuntu/
cd /home/ubuntu/
ls -lah
chown -R ubuntu:ubuntu LibreELEC.tv/
ls -lah
cd LibreELEC.tv/
ls
ls -lah
cd
sudo -i -u ubuntu
ip a
cat /etc/resolv.conf 
ip route
sudo -i -u ubuntu


apt install tmux
sudo -i -u ubuntu tmux a




# And back home you can write
ls -lah ls/u1/rootfs/home/ubuntu/LibreELEC.tv/target/

2.1.1.4 - Fancontrol

Add this to the /storage/bin and create a service unit.

vi /storage/.config/system.d/fancontrol.service

systemctl enable fancontrol
#!/bin/sh

# Summary
#
# Adjust fan speed by percentage when CPU/GPU is between user set
# Min and Max temperatures.
#
# Notes
#
# Temp can be gleaned from the sysfs termal_zone files and are in
# units millidegrees meaning a reading of 30000 is equal to 30.000 C
#
# Fan speed is read and controlled by he pwm_fan module and can be
# read and set from a sysfs file as well. The value can be set from 0 (off)
# to 255 (max). It defaults to 255 at start


## Set Points

# CPU Temp set points
MIN_TEMP=40 # Min desired CPU temp
MAX_TEMP=60 # Max desired CPU temp


# Fan Speeds set points
FAN_OFF=0       # Fan is off
FAN_MIN=38      # Some fans need a minimum of 15% to start from a dead stop.
FAN_MAX=255     # Max cycle for fan

# Frequency
CYCLE_FREQ=6            # How often should we check, in seconds
SHORT_CYCLE_PERCENT=20  # If we are shutting on or of more than this percent of the
                        # time, just run at min rather than shutting off

## Sensor and Control files

# CPU and GPU sysfs locations
CPU=/sys/class/thermal/thermal_zone0/temp
GPU=/sys/class/thermal/thermal_zone1/temp

# Fan Control files
FAN2=/sys/devices/platform/pwm-fan/hwmon/hwmon2/pwm1
FAN3=/sys/devices/platform/pwm-fan/hwmon/hwmon3/pwm1



## Logic

# The fan control file isn't available until the module loads and
# is unpredictable in path. Wait until it comes up

FAN=""
while [[ -z $FAN ]];do
        [[ -f $FAN2 ]] && FAN=$FAN2
        [[ -f $FAN3 ]] && FAN=$FAN3
        [[ -z $FAN ]] && sleep 1
done

# The sensors are in millidegrees so adjust the user
# set points to the same units

MIN_TEMP=$(( $MIN_TEMP * 1000 ))
MAX_TEMP=$(( $MAX_TEMP * 1000 ))


# Short cycle detection requires us to track the number
# of on-off flips to cycles

CYCLES=0
FLIPS=0

while true; do

        # Set TEMP to the highest GPU/CPU Temp
        TEMP=""
        read TEMP_CPU < $CPU
        read TEMP_GPU < $GPU
        [[ $TEMP_CPU -gt $TEMP_GPU ]] && TEMP=$TEMP_CPU || TEMP=$TEMP_GPU

        # How many degress above or below our min threshold are we?
        DEGREES=$(( $TEMP-$MIN_TEMP ))

        # What percent of the range between min and max is that?
        RANGE=$(( $MAX_TEMP-$MIN_TEMP ))
        PERCENT=$(( (100*$DEGREES/$RANGE) ))

        # What number between 0 and 255 is that percent?
        FAN_SPEED=$(( (255*$PERCENT)/100 ))

        # Override the calculated speed for some special cases
        if [[ $FAN_SPEED -le $FAN_OFF ]]; then                  # Set anything 0 or less to 0
                FAN_SPEED=$FAN_OFF
        elif [[ $FAN_SPEED -lt $FAN_MIN ]]; then                # Set anything below the min to min
                FAN_SPEED=$FAN_MIN
        elif [[ $FAN_SPEED -ge $FAN_MAX ]]; then                # Set anything above the max to max
                FAN_SPEED=$FAN_MAX
        fi

        # Did we just flip on or off?
        read -r OLD_FAN_SPEED < $FAN
        if (    ( [[ $OLD_FAN_SPEED -eq 0 ]] && [[ $FAN_SPEED -ne 0 ]] ) || \
                ( [[ $OLD_FAN_SPEED -ne 0 ]] && [[ $FAN_SPEED -eq 0 ]] ) ); then
                FLIPS=$((FLIPS+1))
        fi

        # Every 10 cycles, check to see if we are short-cycling
        CYCLES=$((CYCLES+1))
        if [[ $CYCLES -ge 10 ]] && [[ ! $SHORT_CYCLING ]]; then
                FLIP_PERCENT=$(( 100*$FLIPS/$CYCLES ))
                if [[ $FLIP_PERCENT -gt $SHORT_CYCLE_PERCENT ]]; then
                        SHORT_CYCLING=1
                        echo "Short-cycling detected. Fan will run at min speed rather than shutting off."
                else
                        CYCLES=0;FLIPS=0
                fi
        fi

        # If we are short-cycling and would turn the fan off, just set to min
        if [[ $SHORT_CYCLING ]] && [[ $FAN_SPEED -le $FAN_MIN ]]; then
                FAN_SPEED=$FAN_MIN
        fi

        # Every so often, exit short cycle mode to see if conditions have changed
        if [[ $SHORT_CYCLING ]] && [[ $CYCLES -gt 10000 ]]; then  # Roughly half a day
                echo "Exiting short-cycling"
                SHORT_CYCLING=""
        fi

        # Write that to the fan speed control file
        echo $FAN_SPEED > $FAN

        # Log the stats everyone once in a while
#       if [[ $LOG_CYCLES ]] && [[ $LOG_CYCLES -ge 10 ]]; then
#               echo "Temp was $TEMP fan set to $FAN_SPEED"
#               LOG_CYCLES=""
#       else
#               LOG_CYCLES=$(($LOG_CYCLES+1))
#       fi

        sleep $CYCLE_FREQ

done

# Also look at drive temps. The sysfs filesystem isn't useful for
# all drives on RockPro64 so use smartctl instead

#ls -1 /dev/sd? | xargs -n1 smartctl -A | egrep ^194 | awk '{print $4}'

2.1.1.5 - MergerFS

This is a good option if you just want to combine disks and unlike most other RAID technologies, if you loose a disk the rest will keep going. Many people combine this with SnapRAID for off-line parity.

Prepare and Exempt Disks

Prepare and exempt the file systems from auto-mounting1 so you can supply your own mount options and make sure they are up before you start MergerFS.

Make sure to wipe the disks before using as wipefs and fdisk are not available in LibreELEC.

# Assuming the disks are wiped, format and label each disk the same
mkfs.ext4 /dev/sda 
e2label /dev/sda pool-member

# Copy the udev rule for editing 
cp /usr/lib/udev/rules.d/95-udevil-mount.rules /storage/.config/udev.rules.d
vi /storage/.config/udev.rules.d/95-udevil-mount.rules

Edit this section by adding the pool-member label from above

# check for special partitions we dont want mount
IMPORT{builtin}="blkid"
ENV{ID_FS_LABEL}=="EFI|BOOT|Recovery|RECOVERY|SETTINGS|boot|root0|share0|pool-member", GOTO="exit"

Test this by rebooting and making sure the drives are not mounted.

Add Systemd Mount Units

Each filesystem requires a mount unit like below. Create one for each drive named disk1, disk2, etc. Note: The name of the file is import and to mount /storage/disk1 the name of the file must be storage-disk1.mount

vi /storage/.config/system.d/storage-disk1.mount
[Unit]
Description=Mount sda
Requires=dev-sda.device
After=dev-sda.device

[Mount]
What=/dev/sda
Where=/storage/disk1
Type=ext4
Options=rw,noatime,nofail

[Install]
WantedBy=multi-user.target
systemctl enable --now storage-disk1.mount

Download and Test MergerFS

MergerFS isn’t available as an add-on, but you can get it directly from the developer. LibreELEC (or CoreELEC) on ARM have a 32 bit[^2] user space so you’ll need the armhf version.

wget https://github.com/trapexit/mergerfs/releases/latest/download/mergerfs-static-linux_armhf.tar.gz

tar --extract --file=./mergerfs-static-linux_armhf.tar.gz --strip-components=3 usr/local/bin/mergerfs

mkdir bin
mv mergerfs bin/

Mount the drives and run a test like below. Notice the escaped *. That’s needed at the command line to prevent shell globbing.

mkdir /storage/pool
/storage/bin/mergerfs /storage/disk\* /storage/pool/

Create the MergerFS Service

vi /storage/.config/system.d/mergerfs.service
[Unit]
Description = MergerFS Service
After=storage-disk1.mount storage-disk2.mount storage-disk3.mount storage-disk4.mount
Requires=storage-disk1.mount storage-disk2.mount storage-disk3.mount storage-disk4.mount

[Service]
Type=forking
ExecStart=/storage/bin/mergerfs -o category.create=mfs,noatime /storage/disk* /storage/pool/
ExecStop=umount /storage/pool

[Install]
WantedBy=default.target
systemctl enable --now mergerfs.service

Your content should now be available in /storage/pool after boot.

2.1.1.6 - Remotes

Most remotes just work. Newer ones emulate a keyboard and send well-known multimedia keys like ‘play’ and ‘volume up’. If you want to change what a button does, you can tell Kodi what to do pretty easily. In addition, LibreELEC also supports older remotes using eventlircd and popular ones are already configured. You can add unusual ones as well as get normal remotes to perform arbitrary actions when kodi isn’t running (like telling the computer to start kodi or shutdown cleanly).

Modern Remotes

If you plug in a remote receiver and the kernel makes reference to a keyboard you have a modern remote and Kodi will talk to it directly.

dmesg

input: BESCO KSL81P304 Keyboard as ...
hid-generic 0003:2571:4101.0001: input,hidraw0: USB HID v1.11 Keyboard ...

If you want to change a button action, put kodi into log mode, tail the logfile, and press the button in question to see what event is detected.

# Turn on debug
kodi-send -a toggledebug

# Tail the logfile
tail -f /storage/.kodi/temp/kodi.log

   debug <general>: Keyboard: scancode: 0xac, sym: 0xac, unicode: 0x00, modifier: 0x0
   debug <general>: HandleKey: browser_home (0xf0b6) pressed, window 10000, action is ActivateWindow(Home)

In this example, we pressed the ‘home’ button on the remote. That was detected as a keyboard press of the browser_home key. This is just one of many defined keys like ’email’ and ‘calculator’ that can be present on a keyboard. Kodi has a default action of that and you can see what it is in the system keymap

# View the system keyboard map to see what's happening by default
cat /usr/share/kodi/system/keymaps/keyboard.xml

To change what happens, create a user keymap. Any entries in it will override the default.

# Create a user keymap that takes you to 'Videos' instead of 'Home'
vi /storage/.kodi/userdata/keymaps/keyboard.xml
<keymap>
  <global>
    <keyboard>
      <browser_home>ActivateWindow(Videos)</browser_home>
    </keyboard>
  </global>
</keymap>
kodi-send -a reloadkeymaps

Legacy Remotes

How They Work

Some receivers don’t send well-known keys. For these, there’s eventlircd. LibreELEC has a list of popular remotes that fall into this category and will dynamically use it as needed. For instance, pair an Amazon Fire TV remote and udev will fire, match a rule in /usr/lib/udev/rules.d/98-eventlircd.rules, and launch eventlircd with the buttons mapped in /etc/eventlircd.d/aftvsremote.evmap.

These will interface with Kodi using it’s “LIRC” (Linux Infrared Remote Contoll) interface. And just like with keyboards, there’s a set of well-known remote keys Kodi will accept. Some remotes don’t know about these so eventlircd does some pre-translation before relaying to Kodi. If you look in the aftvsremote.evmap file for example, you’ll see that KEY_HOMEPAGE = KEY_HOME.

To find out if your remote falls into this category, enable logging, tail the log, and if your remote has been picked up for handling by eventlircd you’ll see some entries like this.

    debug <general>: LIRC: - NEW 66 0 KEY_HOME devinput (KEY_HOME)
    debug <general>: HandleKey: percent (0x25) pressed, window 10000, action is PreviousMenu

In the first line, Kodi notes that it’s LIRC interface received a KEY_HOME button press. (Eventlircd actually translated it, but that happened before kodi saw anything.) In the second line, Kodi says it received the key ‘percent’, and preformed the action ‘Back’. The part where Kodi says ‘percent (0x25)’ was pressed seems resistent to documentation, but the action of PreviousMenu is the end result. The main question is why?

Turns out that Kodi has a pre-mapping file for events relayed to it from LIRC systems. There’s a mapping for ‘KEY_HOME’ that kodi translates to the well-known key ‘start’. Then Kodi checks the normal keymap file and ‘start’ translates to the Kodi action ‘Back’

Take a look at the system LIRC mapping file to see for yourself.

# The Lircmap file has the Kodi well-known button (start) surrounding the original remote command (KEY_HOME)
grep KEY_HOME /usr/share/kodi/system/Lircmap.xml

      <start>KEY_HOME</start>

Then take a look at the normal mapping file to see how start get’s handled

# The keymap file has the well-known Kodi button surrounding the Kodi action, 
grep start /usr/share/kodi/system/keymaps/remote.xml 

      <start>PreviousMenu</start>

You’ll actually see quite a few things are mapped to ‘start’ as it does different things depending on what part of Kodi you are accessing at the time.

Changing Button Mappings

You have a few options an they are listed here in increasing complexity. Specifically, you can

  • Edit the keymap
  • Edit the Lircmap and keymap
  • Edit the eventlircd evmap

Edit the Keymap

To change what the KEY_HOME button does you can create a user keymap like before and override it. It just needs a changed from keyboard to remote for entering through the LIRC interface. In this example we’ve set it to actually take you home via the kodi function ActivateWindow(Home).

vi /storage/.kodi/userdata/keymaps/remote.xml
<keymap>
  <global>
    <remote>
      <start>ActivateWindow(Home)</start>
    </remote>
  </global>
</keymap>

Edit the Lircmap and Keymap

This can occasionally cause problems though - such as when you have another button that already gets translated to start and you want it to keep working the same. In this case, you make an edit at the Lircmap level to translate KEY_HOME to some other button first, then map that button to the action you want. (You can’t put the Kodi function above in the Lircmap file so you have to do a double hop.)

First, let’s determine what the device name should be with the irw command.

irw

# Hit a button and the device name will be at the end
66 0 KEY_HOME devinput

Now let’s pick a key. My remote doesn’t have a ‘red’ key, so lets hijack that one. Note the device name devinput from the above.

vi /storage/.kodi/userdata/Lircmap.xml
<lircmap>
   <remote device="devinput">
      <red>KEY_HOME</red>
   </remote>
</lircmap>

Then map the key restart kodi (the keymap reload command doesn’t handle Lircmap)

vi /storage/.kodi/userdata/keymaps/remote.xml
<keymap>
  <global>
    <remote>
      <red>ActivateWindow(Home)</red>
    </remote>
  </global>
</keymap>
systemctl restart kodi

Edit the Eventlircd Evmap

You can also change what evenlircd does. If LibreELEC wasn’t a read-only filesystem you’d have done this first. But you can do it with a but more work than the above if you prefer.

# Copy the evmap files
cp -r /etc/eventlircd.d /storage/.config/

# Override where the daemon looks for it's configs
systemctl edit --full eventlircd

# change the ExecStart line to refer to the new location - add vvv to the end for more log info
ExecStart=/usr/sbin/eventlircd -f --evmap=/storage/.config/eventlircd.d --socket=/run/lirc/lircd -vvv

# Restart, replug the device and grep the logs to see what evmap is in use
systemctl restart eventlircd
journalctl | grep evmap

# Edit that map to change how home is mapped (yours may not use the default map)
vi /storage/.config/eventlircd.d/default.evmap
KEY_HOMEPAGE     = KEY_HOME

Dealing With Unknown Buttons

Sometimes, you’ll have a button that does nothing at all.

    debug <general>: LIRC: - NEW ac 0 KEY_HOMEPAGE devinput (KEY_HOMEPAGE)
    debug <general>: HandleKey: 0 (0x0, obc255) pressed, window 10016, action is 

In this example Kodi received the KEY_HOMEPAGE button, consulted it’s Lircmap.xml and didn’t find anything. This is because eventlircd didn’t recognize the remote and translate it to KEY_HOME like before. That’s OK, we can just add a user LIRC mapping. If you look through the system file you’ll see things like ‘KEY_HOME’ are tto the ‘start’ button. So let’s do the same.

vi /storage/.kodi/userdata/Lircmap.xml
<lircmap>
   <remote device="devinput">
      <start>KEY_HOMEPAGE</start>
   </remote>
</lircmap>
systemctl restart kodi

Check the log and you’ll see that you now get

    debug <general>: LIRC: - NEW ac 0 KEY_HOMEPAGE devinput (KEY_HOMEPAGE)
    debug <general>: HandleKey: 251 (0xfb, obc4) pressed, window 10025, action is ActivateWindow(Home)

Remotes Outside Kodi

You may want a remote to work outside of kodi too - say because you want to start kodi with a remote button. If you have a modern remote that eventlircd didn’t capture, you must first add your remote to the list of udev rules.

Capture The Remote

First you must identify the remote with lsusb. It’s probably the only non-hub device listed.

lsusb
...
...
Bus 006 Device 002: ID 2571:4101 BESCO KSL81P304
                        ^     ^
Vendor ID -------------/       \--------- Model ID
...

Then, copy the udev rule file and add a custom rule for your remote.

cp /usr/lib/udev/rules.d/98-eventlircd.rules /storage/.config/udev.rules.d/
vi /storage/.config/udev.rules.d/98-eventlircd.rules
...
...
...
ENV{ID_USB_INTERFACES}=="", IMPORT{builtin}="usb_id"

# Add the rule under the above line so the USB IDs are available. 
# change the numbers to match the ID from lsusb

ENV{ID_VENDOR_ID}=="2571", ENV{ID_MODEL_ID}=="4101", \
  ENV{eventlircd_enable}="true", \
  ENV{eventlircd_evmap}="default.evmap"
...

Now, reboot, turn on logging and see what the buttons show up as. You can also install the system tools add-on in kodi, and at the command line, stop kodi and the eventlircd service, then run evtest and press some buttons. You should see something like

Testing ... (interrupt to exit)
Event: time 1710468265.112925, type 4 (EV_MSC), code 4 (MSC_SCAN), value c0223
Event: time 1710468265.112925, type 1 (EV_KEY), code 172 (KEY_HOMEPAGE), value 1
Event: time 1710468265.112925, -------------- SYN_REPORT ------------
Event: time 1710468265.200987, type 4 (EV_MSC), code 4 (MSC_SCAN), value c0223
Event: time 1710468265.200987, type 1 (EV_KEY), code 172 (KEY_HOMEPAGE), value 0
Event: time 1710468265.200987, -------------- SYN_REPORT ------------

Configure and Enable irexec

Now that you have seen the event, you must have the irexec process watching for it to take action. Luckily, LibreELEC already includes it.

vi /storage/.config/system.d/irexec.service
[Unit]
Description=IR Remote irexec config
After=eventlircd.service
Wants=eventlircd.service

[Service]
ExecStart=/usr/bin/irexec --daemon /storage/.lircrc
Type=forking

[Install]
WantedBy=multi-user.target

We’ll create a the config file next. The config is the command or script to run. systemctl start kodi in our case.

vi /storage/.lircrc
begin
    prog   = irexec
    button = KEY_HOMEPAGE
    config = systemctl start kodi
end

Let’s enable and start it up

systemctl enable --now irexec

Go ahead and stop kodi, then press the KEY_HOMEPAGE button on your remote. Try config entries like echo start kodi > /storage/test-results if you have issues and wonder if it’s running.

Notes

You may notice that eventlircd is always running, even if it has no remotes. That’s of a unit file is in /usr/lib/systemd/system/multi-user.target.wants/. I’m not sure of why this is the case when there is no remote in play.

https://discourse.osmc.tv/t/cant-create-a-keymap-for-a-remote-control-button-which-is-connected-by-lircd/88819/6

2.2 - Signage

2.2.1 - Anthias (Screenly)

Overview

Anthias (AKA Screenly) is a simple, open-source digital signage system that runs well on a raspberry pi. When plugged into a monitor, it displays images, video or web sites in slideshow fashion. It’s managed directly though a web interface on the device and there are fleet and support options.

Preparation

Use the Raspberry Pi Imager to create a 64 bit Raspberry Pi OS Lite image. Select the gear icon at the bottom right to enable SSH, create a user, configure networking, and set the locale. Use SSH continue configuration.

setterm --cursor on

sudo raspi-config nonint do_change_locale en_US-UTF-8
sudo raspi-config nonint do_configure_keyboard us
sudo raspi-config nonint do_wifi_country US
sudo timedatectl set-timezone America/New_York
  
sudo raspi-config nonint do_hostname SOMENAME

sudo apt update;sudo apt upgrade -y

sudo reboot

Enable automatic updates and enable reboots

sudo apt -y install unattended-upgrades

# Remove the leading slashes from some of the updates and set to true
sudo sed -i 's/^\/\/\(.*origin=Debian.*\)/  \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-New-Unused-Dependencies \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Automatic-Reboot \).*/  \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades

Installation

bash <(curl -sL https://www.screenly.io/install-ose.sh)

Operation

Adding Content

Navigate to the Web UI at the IP address of the device. You may wish to enter the settings and add authentication and change the device name.

You may add common graphic types, mp4, web and youtube links. It will let you know if it fails to download the youtube video. Some heavy web pages fail to render correctly, but most do.

Images must be sized to for the screen. In most cases this is 1080. Larger images are scaled down, but smaller images are not scaled up. For example, PowerPoint is often used to create slides, but it exports at 720. On a 1080 screen creates black boarders. You can change the resolution on the Pi with rasp-config or add a registry key to Windows to change PowerPoint’s output size.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\PowerPoint\Options]
"ExportBitmapResolution"=dword:00000096

Schedule the Screen

You may want to turn off the display during non-operation hours. The vcgencmd command can turn off video output and some displays will choose to enter power-savings mode. Some displays misbehave or ignore the command, so testing is warranted.

sudo tee /etc/cron.d/screenpower << EOF

# m h dom mon dow usercommand

# Turn monitor on
30 7  * * 1-5 root /usr/bin/vcgencmd display_power 1

# Turn monitor off
30 19 * * 1-5 root /usr/bin/vcgencmd display_power 0

# Weekly Reboot just in case
0 7 * * 1 root /sbin/shutdown -r +10 "Monday reboot in 10 minutes"
EOF

Troubleshooting

YouTube Fail

You may find you must download the video manually and then upload to Anthias. Use the utility yt-dlp to list and then download the mp4 version of a video

yt-dlp --list-formats https://www.youtube.com/watch?v=YE7VzlLtp-4
yt-dlp --format 22 https://www.youtube.com/watch?v=YE7VzlLtp-4

WiFi Disconnect

WiFi can go up and down, and some variants of the OS do not automatically reconnect. You way want to add the following script to keep connected.

sudo touch /usr/local/bin/checkwifi
sudo chmod +x /usr/local/bin/checkwifi
sudo vim.tiny /usr/local/bin/checkwifi
#!/bin/bash

# Exit if WiFi isn't configured
grep -q ssid /etc/wpa_supplicant/wpa_supplicant.conf || exit 

# In the case of multiple gateways (when connected to wired and wireless)
# the `grep -m 1` will exit on the first match, presumably the lowest metric
GATEWAY=$(ip route list | grep -m 1 default | awk '{print $3}')

ping -c4 $GATEWAY > /dev/null

if [ $? != 0 ]
then
  logger checkwifi fail `date`
  service wpa_supplicant restart
  service dhcpcd restart
else
  logger checkwifi success `date`
fi
sudo tee /etc/cron.d/checkwifi << EOF
# Check WiFi connection
*/5 * * * * /usr/bin/sudo -H /usr/local/bin/checkwifi >> /dev/null 2>&1"
EOF

Hidden WiFi

If you didn’t set up WiFi during imaging, you can use raspi-config after boot, but you must add a line if it’s a hidden network, and reboot.

sudo sed -i '/psk/a\        scan_ssid=1' /etc/wpa_supplicant/wpa_supplicant.conf

Wrong IP on Splash Screen

This seems to be captured during installation and then resides statically in this file. Adjust as needed.

# You can turn off the splash screen in the GUI or in the .conf
sed -i 's/show_splash =.*/show_splash = off/' /home/pi/.screenly/screenly.conf

# Or you can correct it in the docker file
vi ./screenly/docker-compose.yml

White Screen or Hung

Anthias works best when the graphics are the correct size. It will attempt to display images that are too large, but this flashes a white screen and eventually hangs the box (at least in the current version). Not all users get the hang of sizing things correctly, so if you have issues, try this script.

#!/bin/bash

# If this device isn't running signage, exit
[ -d /home/pi/screenly_assets ] || { echo "No screenly image asset directory, exiting"; exit 1; }

# Check that mediainfo and imagemagick convert are available
command -v mediainfo || { echo "mediainfo command not available, exiting"; exit 1; }
command -v convert  || { echo "imagemagick convert not available, exiting"; exit 1; }

cd /home/pi/screenly_assets

for FILE in *.png *.jpe *.gif
do
        # if the file doesn't exist, skip this iteration 
        [ -f $FILE ] || continue
        
        # Use mediainfo to get the dimensions at it's much faster than imagemagick              
        read -r NAME WIDTH HEIGHT <<<$(echo -n "$FILE ";mediainfo --Inform="Image;%Width% %Height%" $FILE)

        # if it's too big, use imagemagick's convert. (the mogify command doesn't resize reliably) 
        if [ "$WIDTH" -gt "1920" ] || [ "$HEIGHT" -gt "1080" ]
        then
                echo $FILE $WIDTH x $HEIGHT
                convert $FILE -resize 1920x1080 -gravity center $FILE
        fi
done

No Video After Power Outage

If the display is off when you boot the pi, it may decide there is no monitor. When someone does turn on the display, there is no output. Enable hdmi_force_hotplug in the `/boot/config.txt`` to avoid this problem, and specify the group and mode to 1080 and 30hz.

sed -i 's/.*hdmi_force_hotplug.*/hdmi_force_hotplug=1/' /boot/config.txt
sed -i 's/.*hdmi_group=.*/hdmi_group=2/' /boot/config.txt
sed -i 's/.*hdmi_mode=.*/hdmi_mode=81/' /boot/config.txt

2.2.2 - Anthias Deployment

If you do regular deployments you can create an image. A reasonable approach is to:

  • Shrink the last partition
  • Zero fill the remaining free space
  • Find the end of the last partition
  • DD that to a file
  • Use raspi-config to resize after deploying

Or you can use PiShrink to script all that.

Installation

wget https://raw.githubusercontent.com/Drewsif/PiShrink/master/pishrink.sh
chmod +x pishrink.sh
sudo mv pishrink.sh /usr/local/bin

Operation

# Capture and shrink the image
sudo dd if=/dev/mmcblk0 of=anthias-raw.img bs=1M
sudo pishrink.sh anthias-raw.img anthias.img

# Copy to a new card
sudo dd if=anthias.img of=/dev/mmcblk0 bs=1M

If you need to modify the image after creating it you can mount it via loop-back.

sudo losetup --find --partscan anthias.img
sudo mount /dev/loop0p2 /mnt/

# After you've made changes

sudo umount /mnt
sudo losetup --detach-all

Manual Steps

If you have access to a graphical desktop environment, use GParted. It will resize the filesystem and partitions for you quite easily.

# Mount the image via loopback and open it with GParted
sudo losetup --find --partscan anthias-raw.img

# Grab the right side of the last partition with your mouse and 
# drag it as far to the left as you can, apply and exit
sudo gparted /dev/loop0

Now you need to find the last sector and truncate the file after that location. Since the truncate utility operates on bytes, you convert sectors to bytes with multiplication.

# Find the End of the last partition. In the below example, it's Sector *9812664*
$ sudo fdisk -lu /dev/loop0

Units: sectors of 1 * 512 = 512 bytes

Device       Boot  Start     End Sectors  Size Id Type
/dev/loop0p1        8192  532479  524288  256M  c W95 FAT32 (LBA)
/dev/loop0p2      532480 9812664 9280185  4.4G 83 Linux


sudo losetup --detach-all

sudo truncate --size=$[(9812664+1)*512] anthias-raw.img

Very Manual Steps

If you don’t have a GUI, you can do it with a combination of commands.

# Mount the image via loopback
sudo losetup --find --partscan anthias-raw.img

# Check and resize the file system
sudo e2fsck -f /dev/loop0p2
sudo resize2fs -M /dev/loop0p2

... The filesystem on /dev/loop0p2 is now 1149741 (4k) blocks long

# Now you can find the end of the resized filesystem by:

# Finding the number of sectors.
#     Bytes = Num of blocks * block size
#     Number of sectors = Bytes / sector size
echo $[(1149741*4096)/512]

# Finding the start sector (532480 in the example below)
sudo fdisk -lu /dev/loop0

Device       Boot  Start      End  Sectors  Size Id Type
/dev/loop0p1        8192   532479   524288  256M  c W95 FAT32 (LBA)
/dev/loop0p2      532480 31116287 30583808 14.6G 83 Linux

# Adding the number of sectors to the start sector. Add 1 because you want to end AFTER the end sector
echo $[532480 + 9197928 + 1]

# And resize the part to that end sector (ignore the warnings)
sudo parted resizepart 2 9730409 

Great! Now you can follow the remainder of the GParted steps to find the new last sector and truncate the file.

Extra Credit

It’s handy to compress the image. xz is pretty good for this

xz anthias-raw.img

xzcat anthias-raw.img | sudo dd of=/dev/mmcblk0

In these procedures, we make a copy of the SD card before we do anything. Another strategy is to resize the SD card directly, and then use dd and read in X number of sectors rather than read it all in and then truncate it. A bit faster, if a but less recoverable from in the event of a mistake.

2.2.3 - API

The API docs on the web refer to screenly. Anthias uses an older API. However, you can access the API docs for the version your working with at

http://sign.your.domain/api/docs/

You’ll have to correct the swagger form with correct URL, but after that you can see what you’re working with.

3 - Monitoring

Time series vs event data.

3.1 - Metrics

3.1.1 - Prometheus

Overview

Prometheus is a time series database, meaning it’s optimized to store and work with data organized in time order. It includes in it’s single binary:

  • Database engine
  • Collector
  • Simple web-based user interface

This allows you to collect and manage data with fewer tools and less complexity than other solutions.

Data Collection

End-points normally expose metrics to Prometheus by making a web page available that it can poll. This is done by including a instrumentation library (provided by Prometheus) or simply adding a listener on a high-level port that spits out some text when asked.

For systems that don’t support Prometheus natively, there are a few add-on services to translate. These are called ’exporters’ and translate things such as SNMP into a web format Prometheus can ingest.

Alerting

You can also alert on the data collected. This is through the Alert Manager, a second package that works closely with Prometheus.

Visualization

You still need a dashboard tool like Grafana to handle visualizations, but you can get started quite quickly with just Prometheus.

3.1.1.1 - Installation

Install from the Debian Testing repo, as stable can be up to a year behind.

# Testing
echo 'deb http://deb.debian.org/debian testing main' | sudo tee -a /etc/apt/sources.list.d/testing.list

# Pin testing down to a low level so the rest of your packages don't get upgraded
sudo tee -a /etc/apt/preferences.d/not-testing << EOF
Package: *
Pin: release a=testing
Pin-Priority: 50
EOF

# Living Dangerously with test
sudo apt update
sudo apt install -t testing prometheus

Configuration

Use this for your starting config.

cat /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

This says every 15 seconds, run down the job list. And there is one job - to check out the system at ’localhost:9090’ which happens to be itself.

For every target listed, the scraper makes a web request for /metrics/ and stores the results. It ingests all the data presented and stores them for 15 days. You can choose to ignore certain elements or retain differently by adding config, but by default it takes everything given.

You can see this yourself by just asking like Prometheus would. Hit it up directly in your browser. For example, Prometheus is making metrics available at /metrics

http://some.server:9090/metrics

Operation

User Interface

You can access the Web UI at:

http://some.server:9090

At the top, select Graph (you should be there already) and in the Console tab click the dropdown labeled “insert metric at cursor”. There you will see all the data being exposed. This is mostly about the GO language it’s written in, and not super interesting. A simple Graph tab is available as well.

Data Composition

Data can be simple, like:

go_gc_duration_seconds_sum 3

Or it can be dimensional which is accomplished by adding labels. In the example below, the value of go_gc_duration_seconds has 5 labeled sub-sets.

go_gc_duration_seconds{quantile="0"} 4.5697e-05
go_gc_duration_seconds{quantile="0.25"} 7.814e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103396
go_gc_duration_seconds{quantile="0.75"} 0.000143687
go_gc_duration_seconds{quantile="1"} 0.001030941

In this example, the value of net_conntrack_dialer_conn_failed_total has several.

net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="unknown"} 0

How is this useful? It allows you to do aggregations - such as looking at all the net_contrack failures, and also look at the failures that were specifically refused. All with the same data.

Removing Data

You may have a target you want to remove. Such as a typo hostname that is now causing a large red bar on a dashboard. You can remove that mistake by enabling the admin API and issuing a delete

sudo sed -i 's/^ARGS.*/ARGS="--web.enable-admin-api"/' /etc/default/prometheus

sudo systemctl reload prometheus

curl -s -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="badhost.some.org:9100"}'

The default retention is 15 days. You may want less than that and you can configure --storage.tsdb.retention.time=1d similar to above. ALL data has the same retention, however. If you want historical data you must have a separate instance or use VictoriaMetrics.

Next Steps

Let’s get something interesting to see by adding some OS metrics

Troubleshooting

If you can’t start the prometheus server, it may be an issue with the init file. Test and Prod repos use different defaults. Add some values explicitly to get new versions running

sudo vi /etc/default/prometheus

ARGS="--config.file="/etc/prometheus/prometheus.yml  --storage.tsdb.path="/var/lib/prometheus/metrics2/"

3.1.1.2 - Node Exporter

This is a service you install on your end-points that make CPU/Memory/Etc. metrics available to Prometheus.

Installation

On each device you want to monitor, install the node exporter with this command.

sudo apt install prometheus-node-exporter

Do a quick test to make sure it’s responding to scrapes.

curl localhost:9100/metrics

Configuration

Back on your Prometheus server, add these new nodes as a job in the prometheus.yaml file. Feel free to drop the initial job where Prometheus was scraping itself.

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'servers'
    static_configs:
    - targets:
      - some.server:9100
      - some.other.server:9100
      - and.so.on:9100
sudo systemctl reload prometheus.service

Operation

You can check the status of your new targets at:

http://some.server:9090/classic/targets

A lot of data is collected by default. On some low power systems you may want less. For just the basics, customize the the config to disable the defaults and only enable specific collectors.

In the case below we are reduce collection to just CPU, Memory, and Hardware metrics. When scraping a Pi 3B, this reduces the Scrape Duration from 500 to 50ms.

sudo sed -i 's/^ARGS.*/ARGS="--collector.disable-defaults --collector.hwmon --collector.cpu --collector.meminfo"/' /etc/default/prometheus-node-exporter
sudo systemctl restart prometheus-node-exporter

The available collectors are listed on the page:

https://github.com/prometheus/node_exporter

3.1.1.3 - SNMP Exporter

SNMP is one of the most prevalent (and clunky) protocols still widely used on network-attached devices. But it’s a good general-purpose way to get data from lots of different makes of products in a similar way.

But Prometheus doesn’t understand SNMP. The solution is a translation service that acts a a middle-man and ’exports’ data from those devices in a way Prometheus can.

Installation

Assuming you’ve already installed Prometheus, install some SNMP tools and the exporter. If you have an error installing the mibs-downloader, check troubleshooting at the bottom.

sudo apt install snmp snmp-mibs-downloader
sudo apt install -t testing prometheus-snmp-exporter

Change the SNMP tools config file to allow use of installed MIBs.

sudo sed -i 's/^mibs/# &/' /etc/snmp/snmp.conf

Preparation

We need a target, so assuming you have a switch somewhere and can enable SNMP on it, let’s query the switch for its name, AKA sysName. Here we’re using version “2c” of the protocol with the read-only password “public”. Pretty standard.

# Note: app
snmpwalk -v 2c -c public some.switch.address sysName

SNMPv2-MIB::sysName.0 = STRING: Some-Switch

Note: If you get back an error or just the ‘iso’ prefixed value, double check your MIBs are installed.

Configuration

To add this switch to the Prometheus scraper, add a new job to the prometheus.yaml file. This job will include the targets as normal, but also the path (since it’s different than default) and an optional parameter called module that specific to the SNMP exporter. It also does something confusing - a relabel_config

This is because Prometheus isn’t actually taking to the switch, it’s talking to the local SNMP exporter service. So we put all the targets normally, and then at the bottom ‘oh, by the way, do a switcheroo’. This allows Prometheus to display all the data normally with no one the wiser.

...
...
scrape_configs:
  - job_name: 'snmp'
    static_configs:
      - targets:
        - some.switch.address    
    metrics_path: /snmp
    params:
      module: [if_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9116  # The SNMP exporter's real hostname:port.

Operation

No configuration on the exporter side is needed. Reload the config and check the target list. Then examine data in the graph section. Add additional targets as needed and the exporter will pull in the data.

http://some.server:9090/classic/targets

These metrics are considered well known and so will appear in the database named sysUpTime and upsBasicBatteryStatus and not be prefixed with snmp_ like you might expect.

Next Steps

If you have something non-standard, or you simply don’t want that huge amount of data in your system, look at the link below to customize the SNMP collection with the Generator.

SNMP Exporter Generator Customization

Troubleshooting

The snmp-mibs-downloader is just a handy way to download a bunch of default MIBs so when you use the tools, all the cryptic numbers, like “1.3.6.1.2.1.17.4.3.1” are translated into pleasant names.

If you can’t find the mibs-downloader its probably because it’s in the non-free repo and that’s not enabled by default. Change your apt sources file like so

sudo vi /etc/apt/sources.list

deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free

deb http://security.debian.org/debian-security bullseye-security main contrib non-free
deb-src http://security.debian.org/debian-security bullseye-security main contrib non-free

deb http://deb.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye-updates main contrib non-free

It may be that you only need to change one line.

3.1.1.4 - SNMP Generator

Installation

There is no need to install the Generator as it comes with the SNMP exporter. But if you have a device that supplies it’s own MIB (and many do), you should add that to the default location.

# Mibs are often named SOMETHING-MIB.txt
sudo cp -n *MIB.txt /usr/share/snmp/mibs/

Preparation

You must identify the values you want to capture. Using snmpwalk is a good way to see what’s available, but it helps to have a little context.

The data is arranged like a folder structure that you drill-down though. The folder names are all numeric, with ‘.’ instead of slashes. So if you wanted to get a device’s sysName you’d click down through 1.3.6.1.2.1.1.5 and look in the file 0.

When you use snmpwalk it starts wherever you tell it and then starts drilling-down, printing out everything it finds.

How do you know that’s where sysName is at? A bunch of folks got together (the ISO folks) and decided everything in advance. Then they made some handy files (MIBs) and passed them out so you didn’t have to remember all the numbers.

They allow vendors to create their own sections as well, for things that might not fit anywhere else.

A good place to start is looking at what the vendor made available. You see this by walking their section and including their MIB so you get descriptive names - only the ISO System MIB is included by default.

# The SysobjectID identifies the vendor section
# Note use of the MIB name without the .txt
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address SysobjectID

SNMPv2-MIB::sysObjectID.0 = OID: SOMEVENDOR-MIB::somevendoramerica

# Then walk the vendor section using the name from above
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c some.address somevendoramerica

SOMEVENDOR-MIB::model.0 = STRING: SOME-MODEL
SOMEVENDOR-MIB::power.0 = INTEGER: 0
...
...

# Also check out the general System section
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address system

# You can also walk the whole ISO tree. In some cases,
# there are thousands of entries and it's indeciperable
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.system iso

This can be a lot of information and you’ll need to do some homework to see what data you want to collect.

Configuration

The exporter’s default configuration file is snmp.yml and contains about 57 Thousand lines of config. It’s designed to pull data from whatever you point it at. Basically, it doesn’t know what device it’s talking to, so it tries to cover all the bases.

This isn’t a file you should edit by hand. Instead, you create instructions for the generator and it look though the MIBs and create one for you. Here’s an example for a Samlex Invertor.

vim ~/generator.yml
modules:
  samlex:
    walk:
      - sysLocation
      - inverterMode
      - power
      - vin
      - tempDD
      - tempDA
prometheus-snmp-generator generate
sudo cp /etc/prometheus/snmp.yml /etc/prometheus/snmp.yml.orig
sudo cp ~/snmp.yml /etc/prometheus
sudo systemctl reload prometheus-snmp-exporter.service

Configuration in Prometheus remains the same - but since we picked a new module name we need to adjust that.

    ...
    ...
    params:
      module: [samlex]
    ...
    ...
sudo systemctl reload prometheus.service

Adding Data Prefixes

by default, the names are all over the place. The SNMP Exporter Devs leave it this way because there are a lot of pre-built dashboards on downstream systems that expect the existing names.

If you are building your own downstream systems you can prefix (as is best-practice) as you like with a post generation step. This example cases them all to be prefixed with samlex_.

prometheus-snmp-generator generate
sed -i 's/name: /name: samlex_/' snmp.yml

Combining MIBs

You can combine multiple systems in the generator file to create one snmp.yml file, and refer to them by the module name in the Prometheus file.

modules:
  samlex:
    walk:
      - sysLocation
      - inverterMode
      - power
      - vin
      - tempDD
      - tempDA
  ubiquiti:
    walk:
      - something
      - somethingElse  

Operation

As before, you can get a preview directly from the exporter (using a link like below). This data should show up in the Web UI too.

http://some.server:9116/snmp?module=samlex&target=some.device

Sources

https://github.com/prometheus/snmp_exporter/tree/main/generator

3.2 - Logs

3.3 - Visualization

3.3.1 - Grafana

4 - Network

4.1 - Routing

4.1.1 - Linux Router

Creating a Linux router is fairly simple. Some distros like Alpine Linux are well suited for it but any will do. I used Debian in this example.

Install the base OS without a desktop system. Assuming you have two network interfaces, pick one to be the LAN interface (traditionally the first one, eth0 or such) and set the address statically.

Basic Routing

To route, all you really need do is enable forwarding.

# as root

# enable
sysctl -w net.ipv4.ip_forward=1

# and persist
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

Private Range

If one side is a private network, such as in the 192.168, you probably need to masquerade. This assumes you already have nftables installed and it’s default rules in /etc/nftables.conf

# As root

# Add the firewall rules to masquerade
nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade

# Persist the rules and enable the firewall
nft list ruleset >> /etc/nftables.conf
systemctl enable --now  nftables.service 

DNS and DHCP

If you want to provide network services such as DHCP and DNS, you can add dnsmasq

sudo apt install dnsmasq

Assuming the LAN interface is named eth0 and set to 192.168.0.1.

vi  /etc/dnsmasq.d/netboot.conf 

interface=eth0
dhcp-range=192.0.1.100,192.0.1.200,12h
dhcp-option=option:router,192.168.0.1
dhcp-authoritative

systemctl enable --now  nftables.service

Firewall

You may want to add some firewall rules too.

# allow SSH from the lan interface
sudo nft add rule inet filter input iifname eth0 tcp dport ssh accept

# allow DNS and DHCP from the lan interface
sudo nft add rule inet filter input iifname eth0 tcp dport domain accept
sudo nft add rule inet filter input iifname eth0 udp dport {domain, bootps} 

# Change the default input policy to drop 
sudo nft add chain inet filter input {type filter hook input priority 0\; policy drop\;}

You can fine-tune these a bit more with the nft example.

4.1.2 - OPNsense

10G Speeds

When you set an OPNsense system up with supported 10G cards, say the Intel X540-AT2, you can move 6 to 8 Gb a second. Though this is better than in the past, but not line speed.

# iperf between two systems routed through a dial NIC on OPNsense

[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0040 sec  8.04 GBytes  6.90 Gbits/sec

This is because the packet filter is getting involved. If you disable that you’ll get closer to line speeds

Firewall –> Settings –> Advanced –> Disable Firewall

[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0067 sec  11.0 GBytes  9.40 Gbits/sec

4.2 - VPN

4.2.1 - Wireguard

Wireguard is a new, light-weight VPN that is both faster and simpler than its predecessors. With a small code-base and modern cryptography, it’s the future of VPNs.

Concepts

Wireguard is a layer 3 VPN and as such, only works with IPv4/6. It doesn’t provide DHCP, bridging, or other low-level features.

Participants authenticate using public-key cryptography, use UDP as a transport and do not respond to unauthenticated connection attempts.

Every participant is considered a peer. Each defines their own IP address, routing rules, and decides from whom they will accept traffic. Every peer must exchange public keys with every other other peer. There is no central authority.

Traffic is sent directly between configured peers but can also be relayed through central nodes if so configured by routing rules on the participants.

Scenarios

The way you deploy depends on what you’re doing, but in general you’ll either connect directly point-to-point or create a central server for remote access or management.

Central Server and Remote Access

This is the classic setup where remote systems connect to the network through one central point. Configure a wireguard server as that central point and then your clients (remote peers) to connect.

Central Server and Remote Management

Another common use is to have a fleet of devices ‘phone-home’ so you can reach them easily.

Point to Point

You can also have peers talk directly to each other. This is often used with routers to connect networks across the internet.

4.2.1.1 - Central Server

A central server gives remote devices a reachable target, allowing them to traverse firewalls and NAT and connect. Let’s create a server and generate and add your first remote peer.

Preparation

You’ll need:

  • Public Domain Name or Static IP
  • Linux Server
  • Ability to port-forward UDP 51830

A dynamic domain name will work and it’s reasonably priced (usually free). You just need something for the peers to connect to, though a static IP is best. You can possibly break connectivity if your IP changes while your peers are connected or have the old IP cached.

We use Debian in this example and derivatives should be similar. UDP 51820 is the standard port but you can choose another if desired.

You must also choose a VPN network that doesn’t overlap with your existing networks. We use 192.168.100.0/24 in this example.

Installation

sudo apt install wireguard-tools

Configuration

All the server needs is a single config file and it will look something like this:

[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = sGp9lWqfBx+uOZO8V5NPUlHQ4pwbvebg8xnfOgR00Gw=

We picked .1 as our server address (pretty standard), created a private key with the wg tool, and put that in the file /etc/wireguard/wg0.conf. Here’s the commands to do that.

# As root
cd /etc/wireguard/
umask 077

wg genkey > server_privatekey
wg pubkey < server_privatekey > server_publickey

read PRIV < server_privatekey

cat << EOF > wg0.conf
[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = $PRIV
EOF

Operation

The VPN operates by creating network interface and loading a kernel module. You use the linux ip command to add a network interface of type wireguard (that automatically loads the kernel module) or use the wg-quick command do do it for you. Name the interface wg0 and it will pull in the config wg0.conf

Test the Interface

wg-quick up wg0

ping 192.168.100.1

wg-quick down wg0

Enable The Service

For normal use, employ systemctl to create a service using the installed service file.

systemctl enable --now wg-quick@wg0

Administration

The most common procedure is adding new clients. Each must have a unique key and IP, as the keys are hashed and used as part of the internal routing.

Create a Client

Let’s create a client config file by generating a key and assigning them an IP. It’s not secure, but it is pragmatic.

wg genkey > client_privatekey # Generates and saves the client private key
wg pubkey < client_privatekey # Displays the client's public key

Add the client’s public key and IP to your server’s wg0.conf and reload. For the IP, it’s fine to just increment. Note the /32, meaning we will only accept that IP from this peer.

[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = XXXXXX

##  Some Client  ##
[Peer]
PublicKey = XXXXXX
AllowedIPs = 192.168.100.2/32
wg-quick down wg0 &&  wg-quick up wg0

Send The Client Config

A client config file should look similar to this. The [Interface] is about the client and the [Peer] is about the server.

[Interface]
PrivateKey = THE-CLIENT-PRIVATE-KEY
Address = 192.168.100.2/32

[Peer]
PublicKey = YOUR-SERVERS-PUBLIC-KEY
AllowedIPs = 192.168.100.0/24
Endpoint = your.server.org:51820

Put in the keys and domain name, zip it up and send it on to your client as securely as possible. One neat trick is to display a QR code right in the shell. Devices that have a camera can import from that.

qrencode -t ANSIUTF8 < client-wg0.conf

Test The Client

You should be able to ping the server from the client. If not, take a look at the troubleshooting steps.

Next Steps

We haven’t enabled forwarding yet or set up firewall rules as those depend on what role your central peer will play. Proceed on to Remote Access or Remote Management as desired.

Troubleshooting

When something is wrong, you don’t get an error message, you just get nothing. You bring up the client interface but you can’t ping the server 192.168.100.1. But you can turn on log messages on the server with this command.

echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg

# When done, send a '-p'

Key Errors

wg0: Invalid handshake initiation from 205.133.134.15:18595

In this case, you should check your keys and possibly take the server interface down and up.

Typeos

ifconfig: ioctl 0x8913 failed: No such device

Check your conf is named /etc/wireguard/wg0.conf and look for any typoes.

Firewall Issues

If you see no wireguard error messages, you should suspect your firewall. Since it’s UDP you can’t test the port directly, but you can use netcat.

nc -ulp 51820  # On the server

nc -u some.server 51820 # On the client. Type and see if it shows up on the server

4.2.1.2 - Remote Access

This is the classic setup where remote peers initiate a connection to the central peer through the internet. That central system forwards their traffic onward to the corporate network.

Traffic Handling

The main choice is route or masquerade .

Routing

If you route, the client’s VPN IP address is what other devices see. This is generally preferred as it allows you to log who was doing what at the individual servers. But you must update your network equipment to treat the central server as a router.

Masquerading

Masquerading causes the server to translate all the traffic. This makes everything look like its coming from the server. It’s less secure, but less complicated and much quicker to implement.

For this example, we will masquerade traffic from the server.

Central Server Config

Enable Masquerade

Use sysctl to enable forwarding on the server and nft to add masquerade.

# as root
sysctl -w net.ipv4.ip_forward=1

nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade

Persist Changes

It’s best if we add our new rules onto the defaults and enable the nftables service.

# as root
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

nft list ruleset >> /etc/nftables.conf

systemctl enable --now  nftables.service 

Client Config

Your remote peer - the one you created when setting up the server - needs it’s AllowedIPs adjusted so it knows to send more traffic through the tunnel.

Full Tunnel

This sends all traffic from the client over the VPN.

AllowedIPs = 0.0.0.0/0

Split Tunnel

The most common config is to send specific networks through the tunnel. This keeps netflix and such off the VPN

AllowedIPs = 192.168.100.0/24, 192.168.XXX.XXX, 192.168.XXX.YYY

DNS

In some cases, you’ll need the client to use your internal DNS server to resolve private domain names. Make sure this server is in the AllowedIPs above.

[Interface]
PrivateKey = ....
Address = ...
DNS = 192.168.100.1

Access Control

Limit Peer Access

By default, everything is open and all the peers can talk to each other and the internet at large - even NetFlix! (they can edit their side of the connection at will). So let’s add some rules to the default filter table.

This example prevents peers from from talking to each other but let’s them ping the central server and reach the corporate network.

# Load the base config in case you haven't arleady. This includes the filter table
sudo nft -f /etc/nftables.conf

# Reject any traffic being sent outside the 192.168.100.0/24
sudo nft add rule inet filter forward iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp type admin-prohibited

# Reject any traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

Grant Admin Access

You may want to add an exception for one of the addresses so that an administrator can interact with the remote peers. Order matters, so add it before before the other rules above

sudo nft -f /etc/nftables.conf

# Allow an special 'admin' peer full access and others to reply
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept

# As above
...
...

Save Changes

Since this change is a little more complex, we’ll replace the existing file config file and add notes.

sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0
        }
        chain forward {
                type filter hook forward priority 0

                # Accept admin traffic and responses
                iifname "wg0" ip saddr 192.168.100.2 accept
                iifname "wg0" ct state {established, related} accept

                # Reject other traffic between peers
                iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

                # Reject traffic outside the desired network
                iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp admin-prohibited
        }
        chain output {
                type filter hook output priority 0
        }
}
table ip nat {
        chain postrouting {
                type nat hook postrouting priority srcnat
                masquerade
        }
}

Note: The syntax of the file is slightly different than the command. You can use nft list ruleset to see how nft config and commands translate into running rules. For example - the policy accept is being appended. You may want to experiment with explicitly adding policy drop.

The forwarding chain is where routing type rules go (the input chain is traffic sent to the host itself). Prerouting might work as well, though it’s less common and not present by default.

Notes

The default nftable config file in Debian is:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
	chain input {
		type filter hook input priority filter;
	}
	chain forward {
		type filter hook forward priority filter;
	}
	chain output {
		type filter hook output priority filter;
	}
}

If you have old iptables rules you want to translate to nft, you can install iptables and add them (they get translated on the fly into nft) and nft list ruleset to see how to they turn out.

4.2.1.3 - Remote Mgmt

In this scenario peers initiate connections to the cerntral server, making their way through NAT and Firewalls, but you don’t want to forward their traffic.

Central Server Config

No forwarding or masquerade is desired, so there is no additional configuration to the central server.

Client Config

The remote peer - the one you created when setting up the server - is already set up with one exception; a keep-alive.

When the remote peer establishes it’s connection to the central server, intervening firewalls allow you to talk back as they assume it’s in response. However, the firewall will eventually ‘close’ this window unless the client continues sending traffic occasionally to ‘keep alive’ the connection.

# Add this to the bottom of your client's conf file
PersistentKeepalive = 20

Firewall Rules

You should apply some controls to your clients to prevent them from talking to each other (and possibly the server and you also need a rule for the admin station. You can do this by adding rules to the forward chain.

# Allow an 'admin' peer at .2 full access to others and accept their replies
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept
# Reject any other traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited

You can persist this change by editing your /etc/nftables.conf file to look like this.

sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority 0;
        }
        chain forward {
                type filter hook forward priority 0;

                # Accept admin traffic
                iifname "wg0" ip saddr 192.168.100.2 accept
                iifname "wg0" ct state {established, related} accept

                # Reject other traffic between peers
                iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
        }
        chain output {
                type filter hook output priority 0;
        }
}
table ip nat {
        chain postrouting {
                type nat hook postrouting priority srcnat; policy accept;
                masquerade
        }
}

4.2.1.4 - Routing

Rather than masquerade, your wireguard server can forward traffic with the VPN addresses intact. You must handle that on your network in one of the following ways.

Symmetric Routing

Classically, you’d treat the wireguard server like any other router. You’d create a management interface and/or a routing interface and advertise routes appropriately.

On a small network, you would simply overlay an additional IP range on top of the existing on by adding a second IP address on your router and put your wireguard server on that network. Your local servers will see the VPN addressed clients and send traffic to the router that will pass it to the wireguard server.

Asymmetric Routing

In a small network you might have the central peer on the same network as the other servers. In this case, it will be acting like a router and forwarding traffic, but the other servers won’t know about it and so will send replies back to their default gateway.

To remedy this, add a static route at the gateway for the VPN range that sends traffic back to the central peer. Asymmetry is generally frowned upon, but it gets the job done with one less hop.

Host Static Routing

You can also configure the servers in question with a static route for VPN traffic so they know to send it directly back to the Wireguard server. This is fastest but you have to visit every host. Though you can use DHCP to distribute this route in some cases.

4.2.1.5 - LibreELEC

LibreELEC and CoreELEC are Linux-based open source software appliances for running the Kodi media player. These can be used as kiosk displays and you can remotely manage them with wireguard.

Create a Wireguard Service

These systems have wireguard support, but use connman that lacks split-tunnel ability1. This forces all traffic through the VPN and so is unsuitable for remote management. To enable split-tunnel, create a wireguard service instead.

Create a service unit file

vi /storage/.config/system.d/wg0.service
[Unit]
Description=start wireguard interface

# The network-online service isn't guaranteed to work on *ELEC
#Requires=network-online.service

After=time-sync.target
Before=kodi.service

[Service]
Type=oneshot
RemainAfterExit=true
StandardOutput=journal

# Need to check DNS is responding before we proceed
ExecStartPre=/bin/bash -c 'until nslookup google.com; do sleep 1; done'

ExecStart=ip link add dev wg0 type wireguard
ExecStart=ip address add dev wg0 10.1.1.3/24
ExecStart=wg setconf wg0 /storage/.config/wireguard/wg0.conf
ExecStart=ip link set up dev wg0
# On the newest version, a manual route addition is needed too
ExecStart=ip route add 10.2.2.0/24 dev wg0 scope link src 10.1.1.3

# Deleting the device seems to remove the address and routes
ExecStop=ip link del dev wg0

[Install]
WantedBy=multi-user.target

Create a Wireguard Config File

Note: This isn’t exactly the same file wg-quick uses, just close enough to confuse.

vi /storage/.config/wireguard/wg0.conf
[Interface]
PrivateKey = XXXXXXXXXXXXXXX

[Peer]
PublicKey = XXXXXXXXXXXXXXX
AllowedIPs = 10.1.1.0/24
Endpoint = endpoint.hostname:31194
PersistentKeepalive = 25

Enable and Test

systemctl enable --now wg0.service
ping 10.1.1.1

Create a Cron Check

When using a DNS name for the endpoint you may become disconnected. To catch this, use a cron job

# Use the internal wireguard IP address of the peer you are connecting to. .1 in this case
crontab -e
*/5 * * * * ping -c1 -W5 10.1.1.1 || ( systemctl stop wg0; sleep 5; systemctl start wg0 )

4.2.1.6 - TrueNAS Scale

You can directly bring up a Wireguard interface in TrueNAS Scale, and use that to remotely manage it.

Wireguard isn’t exposed in the GUI, so use the command line to create a config file and enable the service. To make it persistent between upgrades, add a cronjob to restore the config.

Configuration

Add a basic peer as when setting up a Central Server and save the file on the client as /etc/wireguard/wg1.conf. It’s rumored that wg0 is reserved for the TrueNAS cloud service. Once the config is in place, use wg-quick up wg1 command to test and enable as below.

nano /etc/wireguard/wg1.conf

systemctl enable --now wg-quick@wg1

If you use a domain name in this conf for the other side, this service will fail at boot because DNS isn’t up and it’s not easy to get it to wait. So add a pre-start to the service file to specifically test name resolution.

vi /lib/systemd/system/[email protected]

[Service] 
...
...
ExecStartPre=/bin/bash -c 'until host google.com; do sleep 1; done'

Note: Don’t include a DNS server in your wireguard settings or everything on the NAS will attempt to use your remote DNS and fail if the link goes down.

Accessing Apps

When deploying an app, click the enable “Host Network” or “Configure Host Network” box in the apps config and you should be able to access via the VPN address. On Cobia (23.10) at least. If that fails, you can add a command like this to a post-start in the wireguard config file.

iptables -t nat -A PREROUTING --dst 192.168.100.2 -p tcp --dport 20910 -j DNAT --to-destination ACTUAL.LAN.IP:20910

Detecting IP Changes

The other side of your connection may dynamic address and wireguard wont know about it. A simple solution is a cron job that pings the other side periodically, and if it fails, restarts the interface. This will lookup the domain name again and hopefully find the new address.

touch /etc/cron.hourly/wg_test
chmod +x /etc/cron.hourly/wg_test
vi /etc/cron.hourly/wg_test

#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( wg-quick down wg1 ; wg-quick up wg1 )

Troubleshooting

Cronjob Fails

cronjob kills interface when it can’t ping

or

/usr/local/bin/wg-quick: line 32: resolvconf: command not found

Calling wg-quick via cron causes a resolvconf issue, even though it works at the command line. One solution is to remove any DNS config from your wg conf file so it doesn’t try to register the remote DNS server.

Nov 08 08:23:59 truenas wg-quick[2668]: Name or service not known: `some.server.org:port' Nov 08 08:23:59 truenas wg-quick[2668]: Configuration parsing error … Nov 08 08:23:59 truenas systemd[1]: Failed to start WireGuard via wg-quick(8) for wg1.

The DNS service isn’t available (yet), despite Requires=network-online.target nss-lookup.target already in the service unit file. One way to solve this is a pre-exec in the Service section of the unit file1. This is hacky, but none of the normal directives work.

The cron job above will bring the service up eventually, but it’s nice to have it at boot.

Upgrade Kills Connection

An upgrade comes with a new OS image and that replaces anything you’ve added, such as wireguard config and cronjobs. The only way to persist your Wireguard connection it to put a script on the pool and add a cronjob via the official interface2.

Add this script and change for your pool location. This is set to run every 5 min, as you probably don’t want to wait after an upgrade very long to see if it’s working. You can also use this to detect IP changes over the cron.hourly above.

# Create the location and prepare the files
mkdir /mnt/pool02/bin/
cp /etc/wireguard/wg1.conf /mnt/pool02/bin/
touch /mnt/pool02/bin/wg_test
chmod +x /mnt/pool02/bin/wg_test

# Edit the script 
vi /mnt/pool02/bin/wg_test

#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( cp /mnt/pool02/bin/wg1.conf /etc/wireguard/ ; wg-quick down wg1 ; wg-quick up wg1 )


# Invoke the TrueNAS CLI and add the job
cli
task cron_job create command="/mnt/pool02/bin/wg_test" enabled=true description="test" user=root schedule={"minute": "*/5", "hour": "*", "dom": "*", "month": "*", "dow": "*"}

Notes

https://www.truenas.com/docs/core/coretutorials/network/wireguard/ https://www.truenas.com/community/threads/no-internet-connection-with-wireguard-on-truenas-scale-21-06-beta-1.94843/#post-693601

4.2.1.7 - Proxmox

Proxmox is frequently used in smaller enironments for it’s ability to mix Linux Containers and Virtual Machines at very low cost. LCD - Linux Containers - are especially valuable as they give the bennifits of virtualization with minimal overhead.

Using wireguard in a container simply requires adding the host’s kernel module interface.

Edit the container’s config

On the pve host, for lxc id 101:

echo "lxc.mount.entry = /dev/net/tun /dev/net/tun none bind create=file" >> /etc/pve/lxc/101.conf

Older Proxmox

In the past you had to install the module, or use the DKMS method. That’s no longer needed as the Wireguard kernel module is now available on proxmox with the standard install. You don’t even need to install the wireguard tools. But if you run into trouble you can go through these steps

apt install wireguard
modprobe wireguard

# The module will load dynamically when a conainter starts, but you can also manually load it
echo "wireguard" >> /etc/modules-load.d/modules.conf

5 - Operating Systems

5.1 - NetBoot

Most computers come with ‘firmware’. This is a built-in mini OS, embedded in the chips, that’s just smart enough to start things up and hand-off to something more capable.

That more-capable thing is usually an Operating System on a disk, but it can also be something over the network. This lets you:

  • Run an OS installer, such as when you don’t have one installed yet.
  • Run an the whole OS remotely without having a local disk at all.

PXE

The original way was Intel’s PXE (Preboot eXecution Environment) Option ROM on their network cards. The IBM PC firmware (BIOS) would would turn over execution to it and PXE would use basic network drivers to get on the network.

HTTP Boot

Modern machines have newer firmware (UEFI) and it includes logic on how to use HTTP/S without the need for add-ons. This simplifies thigns and also solves potential man-in-the middle attacks. Both methods are still generally called PXE booting, though.

Building a NetBoot Environment

Start by setting up a HTTP Boot system, then add PXE Booting and netboot.xyz to it. This gets you an installation system. Then proceed to diskless stations.

5.1.1 - HTTP Boot

We’ll set up a PXE Proxy server that runs DHCP and HTTP. This server and can be used along side your existing DHCP/DNS servers. We use Debian in this example but anything that runs dnsmasq should work.

Installation

sudo apt install dnsmasq lighttpd

Configuration

Server

Static IPs are best practice, though we’ll use a hostname in this config, so the main thing is that the server name netboot resolves correctly.

HTTP

Lighttpd serves up from /var/www/http so just drop an ISO there. For example, take a look at the current debian ISO (the numbering changes) at https://www.debian.org/CD/netinst and copy the link in like so:

sudo wget https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-12.6.0-amd64-netinst.iso -P /var/www/html -O debian.iso

DHCP

When configured in proxy dhcp mode: “…dnsmasq simply provides the information given in –pxe-prompt and –pxe-service to allow netbooting”. So only certain settings are available. This is a bit vague, but testing reveals that you must set the boot file name with the dhcp-boot directive, rather than setting it with the more general DHCP option ID 67, for example.

sudo vi /etc/dnsmasq.d/netboot.conf 
# Disable DNS
port=0

# Set for DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy

# Respond to clients that use 'HTTPClient' to identify themselves.
dhcp-pxe-vendor=HTTPClient

# Set the boot file name to the web server URL
dhcp-boot="http://netboot/debian.iso"

# PXE-service isn't actually used, but dnsmasq seems to need at least one entry to send the boot file name when in proxy mode.
pxe-service=x86-64_EFI,"Network Boot"

Client

Simply booting the client and selecting UEFI HTTP should be enough. The debian boot loader is signed and works with secure boot.

In addition to ISOs, you can also specify .efi binaries like grubx64.efi. Some distributions support this, though Debian itself may have issues.

Next Steps

You may want to support older clients by adding PXE Boot support.

Troubleshooting

dnsmasq

A good way to see what’s going on is to enable dnsmasq logging.

# Add these to the dnsmasq config file
log-queries
log-dhcp

# Restart and follow to see what's happening
sudo systemctl restart dnsmasq.service
sudo systemctl -u dnsmasq -f

If you’ve enabled logging in dnsmasq and it’s not seeing any requests, you may need to look at your networking. Some virtual environments suppress DHCP broadcasts when they are managing the IP range.

lighttpd

You can also see what’s being requested from the web server if you enable access logs.

cd /etc/lighttpd/conf-enabled
sudo ln -s ../conf-available/10-accesslog.conf
sudo systemctl restart lighttpd.service
sudo cat /var/log/lighttpd/access.log

5.1.2 - PXE Boot

Many older systems can’t HTTP Boot so let’s add PXE support with some dnsmasq options.

Installation

Dnsmasq

Install as in the httpboot page.

The Debian Installer

Older clients don’t handle ISOs well, so grab and extract the Debian netboot files.

sudo wget http://ftp.debian.org/debian/dists/bookworm/main/installer-amd64/current/images/netboot/netboot.tar.gz -O - | sudo tar -xzvf - -C /var/www/html

Grub is famous for ignoring proxy dhcp settings, so let’s start off the boot with something else; iPXE. It can do a lot, but isn’t signed so you must disable secure boot on your clients.

sudo wget https://boot.ipxe.org/ipxe.efi -P /var/www/html

Configuration

iPXE

Debian is ready to go, but you’ll want to create an auto-execute file for iPXE so you don’t have to type in the commands manually.

sudo vi /var/www/html/autoexec.ipxe
#!ipxe

set base http://netboot/debian-installer/amd64

dhcp
kernel ${base}/linux
initrd ${base}/initrd.gz
boot

Dnsmasq

HTTP and PXE clients need different information to boot. We handle this by adding a filename to the PXE service option. This will override the dhcp-boot directive for PXE clients.

sudo vi /etc/dnsmasq.d/netboot.conf 
# Disable DNS
port=0 
 
# Use in DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy 
 
# Respond to both PXE and HTTP clients
dhcp-pxe-vendor=PXEClient,HTTPClient 
 
# Send the BOOTP information for the clients using HTTP
dhcp-boot="http://netboot/debian.iso" 

# Specify a boot menu option for PXE clients. If there is only one, it's booted immediately.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"

# We also need to enable TFTP for the PXE clients
enable-tftp 
tftp-root=/var/www/html

Client

Both types of client should now work. The debian installer will pull the rest of what it needs from the web.

Next Steps

You can create a boot-menu by adding multiple pxe-service entries in dnsmasq, or by customizing the iPXE autoexec.ipxe files. Take a look at that in the menu page.

Troubleshooting

Text Flashes by, disappears, and client reboots

This is most often a symptom of secure boot still being enabled.

Legacy Clients

These configs are aimed at UEFI clients. If you have old BIOS clients, you can try the pxe-service tag for those.

pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
pxe-service=x86PC,"iPXE (UEFI)", "ipxe.kpxe"

This may not work and there’s a few client flavors so enable the dnsmasq logs to see how they identify themselves. You can also try booting pxelinux as in the Debian docs.

DHCP Options

Dnsmasq also has a whole tag system that you can set and use similar to this:

dhcp-match=set:PXE-BOOT,option:client-arch,7
dhcp-option=tag:PXE-BOOT,option:bootfile-name,"netboot.xyz.efi"

However, dnsmasq in proxy mode limits what you can send to the clients, so we’ve avoided DHCP options and focused on PXE service directives.

Debian Error

*ERROR* CPU pipe B FIFO underrun

You probably need to use the non-free firmware

No Boot option

Try entering the computers bios setup and adding a UEFI boot option for the OS you just installed. You may need to browse for the file \EFI\debian\grubx64.efi

Sources

https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html https://github.com/ipxe/ipxe/discussions/569 https://linuxhint.com/pxe_boot_ubuntu_server/#8

It’s possible to use secure boot if you’re willing to implement a chain of trust. Here’s an example used by FOG to boot devices.

https://forums.fogproject.org/topic/13832/secureboot-issues/3

5.1.3 - menu

It would be useful to have some choices when you netboot. You can use the pxe-service built into dnsmasq but a more flexible option is the menu system provided by the iPXE project.

Installation

Set up a http/pxe net-boot server if you haven’t already.

Configuration

dnsmasq

Configure dnsmasq to serve up the ipxe.efi binary for both types of clients.

# Disable DNS
port=0 
 
# Use in DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy 
 
# Tell dnsmasq to provide proxy PXE service to both PXE and HTTP clients
dhcp-pxe-vendor=PXEClient,HTTPClient 
 
# Send the BOOTP information for the clients using HTTP
dhcp-boot="http://netboot/ipxe.efi" 

# Specify a boot menu option for PXE clients. If there is only one, it's booted immediately.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
  
# We also need to enable TFTP for the PXE clients  
enable-tftp 
tftp-root=/var/www/html

Custom Menu

Change the autoexec.ipxe to display a menu.

sudo vi /var/www/html/autoexec.ipxe
#!ipxe

echo ${cls}

:MAIN
menu Local Netboot Menu
item --gap Local Network Installation
item WINDOWS ${space} Windows 11 LTSC Installation
item DEBIAN ${space} Debian Installation
choose selection && goto ${selection} || goto ERROR

:WINDOWS
echo Some windows things here
sleep 3
goto MAIN

:DEBIAN
dhcp
imgfree
set base http://netboot/debian-installer/amd64
kernel ${base}/linux 
initrd ${base}/initrd.gz
boot || goto ERROR


:ERROR
echo There was a problem with the selection. Exiting...
sleep 3
exit

Operation

You’ll doubtless find additional options to add. You may want to add the netboot.xyz project to your local menu too.

5.1.4 - netboot.xyz

You can add netboot.xyz to your iPXE menu to run Live CDs, OS installers and utilities they provide. This can save a lot of time and their list is always improving.

Installation

You’re going to connect to the web for this, so there’s nothing to install. You can download their efi bootloader manually if you’d like to keep things HTTPS, but they update it regularly so you may fall behind.

Configuration

Autoexec.ipxe

Add a menu item to your autoexec.ipxe. When you select it, iPXE will chainload (in their parlance) the netboot.xyz bootloader.

#!ipxe

echo ${cls}

:MAIN
menu Local Netboot Menu
item --gap Local Network Installation
item WINDOWS ${space} Windows 11 LTSC Installation
item DEBIAN ${space} Debian Installation
item --gap Connect to Internet Sources
item NETBOOT ${space} Netboot.xyz
choose selection && goto ${selection} || goto ERROR

:WINDOWS
echo Some windows things here
sleep 3
goto MAIN

:DEBIAN
dhcp
imgfree
set base http://netboot/debian-installer/amd64
kernel ${base}/linux 
initrd ${base}/initrd.gz
boot || goto ERROR

:NETBOOT
dhcp
chain --autofree http://boot.netboot.xyz || goto ERROR

:ERROR
echo There was a problem with the selection. Exiting...
sleep 3
exit

Local-vars

Netboot.xyz detects that it’s working with a Proxy PXE server and behaves a little differently. For example, you can’t insert your own local menu.ipxe. One helpful addition is a local settings file to speed up boot.

sudo vi /var/www/html/local-vars.ipxe
#!ipxe
set use_proxydhcp_settings true

Operation

You can choose the new menu item and load netboot.xyz. It will take you out the web for more selections. Not everything will load on every client, of course. But it gives you a lot of options.

Next Steps

We glossed over how to install Windows. That’s a useful item.

Troubleshooting

Wrong TFTP Server

tftp://192.168.0.1/local-vars.ipxe....Connection timed out
Local vars file not found... attempting TFTP boot...
DHCP proxy detected, press p to boot from 192.168.0.2...

If your boot client is attempting to connect to the main DHCP server, that server is probably sending value next server: 192.168.0.1 in it’s packets. This isn’t a DNS option per say, but it affects netboot. Dnsmasq does this though Kea doesn’t.

sudo systemctl -u dnsmasq -f

...
...
next server: 192.168.0.1
...
...

The boot still works, it’s just annoying. You can usually ignore the message and don’t have to hit ‘p’.

Exec Format Error

Could not boot: Exec format error (https://ipxe.org/2e008081)

You may see this flash by. Check your menus and local variables file to make sure you’ve in included the #!pxe shebang.

No Internet

You can also host your own local instance.

5.1.5 - windows

To install windows, have iPXE load wimboot then WinPE. From there you can connect to a samba share and start the Windows installer. Just like back in the gold-ole administrative installation point days.

Getting a copy of WinPE the official way is a bit of a hurdle, but definitely less work than setting up a full Windows imaging solution.

Installation

Samba and Wimboot

On the netboot server, install wimboot and Samba.

sudo wget https://github.com/ipxe/wimboot/releases/latest/download/wimboot -P /var/www/html
sudo apt install samba

Window ADK

On a Windows workstation, download the ADK and PE Add-on and install as per Microsoft’s ADK Install Doc.

Configuration

Samba

Prepare the netboot server to receive the Windows files.

sudo vi /etc/samba/smb.conf
[global]
  map to guest = bad user
  log file = /var/log/samba/%m.log

[install]
  path = /var/www/html
  browseable = yes
  read only = no
  guest ok = yes
  guest only = yes
sudo mkdir /var/www/html/winpe
sudo mkdir /var/www/html/win11
sudo chmod o+w /var/www/html/win*
sudo systemctl restart smbd.service

Window ADK

On the Windows workstation, start the deployment environment as an admin and create the working files as below. More info is in Microsoft’s Create Working Files document.

  • Start -> All Apps -> Windows Kits -> Deployment and Imaging Tools Environment (Right Click, More, Run As Admin)
copype amd64 c:\winpe\amd64

Add the required additions for Windows 11 with the commands below. These are the optional components WinPE-WMI and WinPE-SecureStartup and more info is in Microsoft’s Customization Section.

mkdir c:\winpe\offline

dism /mount-Image /Imagefile:c:\winpe\amd64\media\sources\boot.wim /index:1 /mountdir:c:\winpe\offline

dism /image:c:\winpe\offline /add-package /packagepath:"..\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-WMI.cab" /packagepath:"..\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-SecureStartup.cab"

dism /unmount-image /mountdir:c:\winpe\offline /commit

Make the ISO in case you want to HTTP Boot from it later and keep the shell open for later.

MakeWinPEMedia /ISO C:\winpe\amd64 C:\winpe\winpe_amd64.iso

WinPE

Now that you’ve got a copy of WinPE, copy it to the netboot server.

net use q: \\netboot\install
xcopy /s c:\winpe\* q:\winpe

Also create some auto-start files for setup. The first is part to the WinPE system and tells it (generically) what to do after it starts up.

notepad q:\winpe\amd64\winpeshl.ini
[LaunchApps]
"install.bat"

This the second is more specific and associated with the thing you are installing. We’ll mix and match these in the PXE menu later so we can install different things.

notepad q:\win11\install.bat
wpeinit
net use \\netboot
\\netboot\install\win11\setup.exe
pause

Win 11

You also need to obtain the latest ISO and extract the contents.

Wimboot

Bck on the netboot server, customize the WINDOWS section of your autoexex.ipxe like this.

:WINDOWS
dhcp
imgfree
set winpe http://netboot/winpe/amd64
set source http://netboot/win11
kernel wimboot
initrd ${winpe}/media/sources/boot.wim boot.wim
initrd ${winpe}/media/Boot/BCD         BCD
initrd ${winpe}/media/Boot/boot.sdi    boot.sdi
initrd ${winpe}/winpeshl.ini           winpeshl.ini
initrd ${source}/install.bat           install.bat
boot || goto MAIN

You can add other installs by copying this block and changing the :WINDOWS header and source variable.

Next Steps

Add some more installation sources and take a look at the Windows zero touch install.

Troubleshooting

System error 53 has occurred. The network path was not found

A given client may be unable to connect to the SMB service at all, or it may fail after connecting once. It’s possible that the the client does’t have an IP yet. It’s also possible that the server This seems to have something to do with timing and I haven’t found the cause but I suspect its security related. You can wait and it resolves itself.

You can also comment out the winpeshl.ini line and you’ll boot to a command prompt that will let you troubleshoot. Sometimes you just don’t have an IP yet from the DHCP server and you can edit the install.bat file to add a sleep or other things. See then [zero touch deployment] page for some more ideas.

Access is denied

This may be related to the executable bit. If you’ve copied from the ISO they should be set. But if after that you’ve changed anything you could have lost the x bit from setup.exe. It’s hard to know what’s supposed to be set once it’s gone, so you may want to recopy the files.

5.2 - Server Core

Installation Notes

If you’re deploying Windows servers, Server Core is best practice1. Install from USB and it will offer that as a choice - it’s fairly painless. But these instances are designed to be remote-managed so you’ll need to perform a few post-install tasks to help with that.

Server Post-Installation Tasks

Set a Manual IP Address

The IP is DHCP by default and that’s fine if you create a reservation at the DHCP server or just use DNS. If you require a manual address, however:

# Access the PowerShell interface (you can use the server console if desired)

# Identify the desired interface's index number. You'll see multiple per adapter for IP4 and 6 but the interface index will repeat.
Get-NetIPInterface

# Set a manual address, netmask and gateway using that index (12 in this example)
New-NetIPaddress -InterfaceIndex 12 -IPAddress 192.168.0.2 -PrefixLength 24 -DefaultGateway 192.168.0.1

# Set DNS
Set-DNSClientServerAddress –InterfaceIndex 12 -ServerAddresses 192.168.0.1

Allow Pings

This is normally a useful feature, though it depends on your security needs.

Set-NetFirewallRule -Name FPS-ICMP4-ERQ-In -Enabled True

Allow Computer Management

Server core allows ‘Remote Management’ by default2. That is specifically the Server Manager application that ships with Windows Server versions and is included with the Remote Server Admin Tools on Windows 10 professional3 or better. For more detailed work you’ll need to use the Computer Management feature as well. If you’re all part of AD, this is reported to Just Work(TM). If not, you’ll need to allow several ports for SMB and RPC.

# Port 445
Set-NetFirewallRule -Name FPS-SMB-In-TCP -Enabled True

# Port 135
Set-NetFirewallRule -Name WMI-RPCSS-In-TCP -Enabled True


maybe 
FPS-NB_Name-In-UDP
NETDIS-LLMNR-In-UDP

Configuration

Remote Management Client

If you’re using windows 10/11, install it on a workstation by going to System -> Optional features -> View features and enter Server Manager in the search box to select and install.

With AD

When you’re all in the same Domain then everything just works (TM). Or so I’ve read.

Without AD

If you’re not using Active Directory, you’ll have to do a few extra steps before using the app.

Trust The Server

Tell your workstation you trust the remote server you are about to manage4 (yes, seems backwards). Use either the hostname or IP address depending on how your planning to connect - i.e. if you didn’t set up DNS use IPs. Start an admin powershell and enter:

Set-Item wsman:\localhost\Client\TrustedHosts 192.168.5.1 -Concatenate -Force
Add The Server

Start up Server Manager and select Manage -> Add Servers -> DNS and search for the IP or DNS name. Pay attention the server’s name that it detects. If DNS happens to reslove the IP address you put in, as server-1.local for example, you’ll need to repeat the above TrustedHosts command with that specific name.

Manage As…

You may notice that after adding the server, the app tries to connect and fails. You’ll need to right-click it and select Manage As… and enter credentials in the form of server-1\Administrator and select Remember me to have this persist. Here you’ll need to use the actual server name and not the IP. If unsure, you can get this on the server with the hostname command.

Starting Performance Counters

The server you added should now say that it’s performance counters are not started. Right-click to and you can select to start them. The server should now show up as Online and you can perform some basic tasks.

server-1.local\Administrator

Server Manager is the default management tool and newer servers allow remote management by default. The client needs a few things, however.

  • Set DNS so you can resolve by names
  • Configure Trusted Hosts

On the system where you start the the Server Manager app - usually where you are sitting - ensure you can resolve the remote host via DNS. You may want to edit your hosts file if not.

notepad c:\Windows\System32\drivers\etc\hosts

You can now add the remote server.

Manage -> Add Servers -> DNS -> Search Box (enter the other servers hostname) -> Magnifying Glass -> Select the server -> Right Arrow Icon -> OK

(You man need to select Manage As on it)

Allow Computer Management

You can right-click on a remote server and select Computer Management after doing this

MISC

Set-NetFirewallProfile -Profile Domain, Public, Private -Enabled False

5.3 - Virtualization

In the beginning, users time-shared CPUs and virtualization was without form and void. And IBM said “Let there be System/370”. This was in the 70’s and involved men with crew-cuts, horn-rimmed glasses and pocket protectors. And ties.

Today, you can still do full virtualization. Everything is emulated down to the hardware and every system has it’s own kernel and device drivers. Most of the public cloud started out this way at the dawn of the new millennium. It was the way. VMWare was the early player in this area and popularized it on x86 hardware where everyone was using 5% of their pizzabox servers.

The newer way is containerization. There is just one kernel and it keeps groups processes separate from each other. This is possible because Linux implemented kernel namespaces around 2008 - mostly work by IBM, suitably enough. The program used to work with this is named LXC and you’d use commands like sudo lxc-create --template download --name u1 --dist ubuntu --release jammy --arch amd64. Other systems such as LXD and Docker (originally) are layed on top to provide more management.

Twenty some years later, what used to be a hot market is now a commodity that’s essentially given away for free. VMWare was acquired by Broadcom who’s focused on the value-extraction phase of it’s lifecycle and the cloud seems decidedly headed toward containers because of it’s better efficiency and agility.

5.3.1 - Incus

Inucs is a container manager, forked from Canonical’s LXD manager. It combines all the virtues of upstream LXD (containers + vms) with the advantages of community driven additions. You have access to the containers provided by the OCI (open container initiative) as well as being able to create VMs. It is used at the command line and includes a web interface.

Installation

Simply install a base OS on your server and add a few commands. You can install from your distro’s repo, but zabbly (the sponsor) is a bit newer.

As per https://github.com/zabbly/incus

sudo mkdir -p /etc/apt/keyrings/
sudo wget -O /etc/apt/keyrings/zabbly.asc https://pkgs.zabbly.com/key.asc

sudo sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc

EOF'

sudo apt update
sudo apt install -y incus incus-ui-canonical

Configuration

sudo adduser YOUR-USERNAME incus-admin
incus admin init

You’re fine to accept the defaults, though if you’re planning on a cluster consult

https://linuxcontainers.org/incus/docs/main/howto/cluster_form/#cluster-form

Managing Networks

Incus uses managed networks. It creates a private bridged network by default with DHCP, DNS and NAT services. You can create others and it add services similarly. You don’t plug instances in, rather you create a new profile with no network and configure the instance with that profile.

If you’re testing DHCP though, such as when working with netboot, you must create a network without those services. That must be done at the command line with the IP spaces set to none. You can then use that in a profile

incus network create test ipv4.address=none ipv6.address=none
incus profile copy default isolated

You can proceed to the GUI for the rest.

Operation

Windows 11 VM Creation

This requires access to the TPM module and an example at the command line is extracted from https://discussion.scottibyte.com/t/windows-11-incus-virtual-machine/362.

After repacking the installation ISO you can also create through the GUI and add:

incus config device add win11vm vtpm tpm path=/dev/tpm0

Agent

sudo apt install lxd-agent

Notes

LXD is widely admired, but Canonical’s decision to move it to in-house-only led the lead developer and elements of the community to fork.

5.4 - Zero Touch Install

The simplest way to zero-touch install Windows is with a web-generated answer file. Go to a site like schneegans and just create it. This removes the need for the complexity of MDS WDS SCCM etc. systems for normal deployments.

Create An Answer File

Visit schneegans, select the behavior you’d like and download the file. Use it one of the following ways;

USB

After creating the USB installer, copy the file (autounattend.xml) to the root of the USB drive (or one of these locations) and setup will automatically detect it.

Netboot

For a netboot install, copy the file to the sources folder of the Windows files.

scp autounattend.xml netboot:/var/www/html/win11/sources

Additionally, some scripting elements of the install don’t support UNC paths so we must map a drive. Back in the Windows netboot page, we created an install.bat to start the installation. Let’s modify that like so

vi /var/www/html/win11/install.bat
wpeinit

SET SERVER=netboot

:NET
net use q: \\%SERVER%\install

REM If there was a problem with the net use command, 
REM ping, pause and loop back to try again

IF %ERRORLEVEL% NEQ 0 (
  ping %SERVER%
  pause
  GOTO NET
) ELSE (
  q:
  cd win11
  setup.exe
)

Add Packages

The installer can also add 3rd party software packages by adding commands in the Run custom scripts section. The system will need to be on-line to pull from the network so we’ll run them at the initial log-in. And since some versions of windows block anonymous SMB we’ll use HTTP.

Add Package Sources

On the netboot server, create an apps folder for your files and download packages there.

mkdir /var/www/html/apps; cd /var/www/html/apps
wget https://get.videolan.org/vlc/3.0.9.2/win64/vlc-3.0.9.2-win64.msi 
wget https://statics.teams.cdn.office.net/production-windows-x64/enterprise/webview2/lkg/MSTeams-x64.msix

Add to Autounattend.xml

It’s easiest to add this in the web form rather than try and edit the XML file. Go to this section and add a line like this one to the third block of custom scripts. It must run at initial user login as the network isn’t available before that.

Navigate to the block that says:

Scripts to run when the first user logs on after Windows has been installed

For MSI Files

These and handled as .cmd files as in field 1.

msiexec /package http://netboot/apps/GoogleChromeStandaloneEnterprise64.msi /quiet
msiexec /package http://netboot/apps/vlc-3.0.9.2-win64.msi /quiet

For MSIX Files

These are handled as .ps1 files as in field 2.

Add-AppPackage -path http://netboot/apps/MSTeams-x64.msix

Notes

Windows Product Keys https://gist.github.com/rvrsh3ll/0810c6ed60e44cf7932e4fbae25880df

6 - Security

6.1 - CrowdSec

6.1.1 - Installation

Overview

CrowdSec has two main parts; detection and interdiction.

Detection is handled by the main CrowdSec binary. You tell it what files to keep an eye on, how to parse those files, and what something ‘bad’ looks like. It then keeps a list of IPs that have done bad things.

Interdiction is handled by any number of plugins called ‘bouncers’, so named because they block access or kick out bad IPs. They run independently and keep an eye on the list, to do things like edit the firewall to block access for a bad IP.

There is also the ‘crowd’ part. The CrowdSec binary downloads IPs of known bad-actors from the cloud for your bouncers to keep out and submits alerts from your systems.

Installation

With Debian, you can simply add the repo via their script and install with a couple lines.

curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables

This installs both the detection (crowdsec) and the interdiction (crowdsec-firewall-bouncer) parts. Assuming eveything went well, crowdsec will check in with the cloud, download a baseline list of known bad-actors, the firewall-bouncer will set up a basic drop list in the firewall, and crowdsec will start watching your syslog for intrusion attempts.

# Check out the very long drop list
sudo nft list ruleset | less

Configuration

CrowdSec comes pre-configured to watch for ssh brute-force attacks. If you have specific services to watch you can add those as described below.

Add a Service

You probably want to watch a specific service, like web server. Take a look at [https://hub.crowdsec.net/] to see all the available components. For example, browse the collections and search for caddy. The more info link will show you how to install the collection;

sudo cscli collections list -a
sudo cscli collections install crowdsecurity/caddy

Tell CrowdSec where Caddy’s log files are.

sudo tee -a /etc/crowdsec/acquis.yaml << EOF

---
filenames:
 - /var/log/caddy/*.log
labels:
  type: caddy
---
EOF

Restart crowdsec for these changes to take effect

sudo systemctl reload crowdsec

Operation

DataFlow

CrowdSec works by pulling in data from the Acquisition files, Parsing the events, comparing to Scenarios, and then Deciding if action should be taken.

Acquisition of data from log files is based on entries in the acquis.yaml file, and the events given a label as defined in that file.

Those events feed the Parsers. There are a handful by default, but only the ones specifically interested in a given label will see it. They look for keywords like ‘FAILED LOGIN’ and then extract the IP.

Successfully parsed lines are feed to the Scenarios to if what happened matters. The scenarios look for things like 10 FAILED LOGINs in 1 min. This separates the accidental bad password entry from a brute force attempt.

Matching a scenario gets the IP added to the Decision List, i.e the list of bad IPs. These have a configurable expiration, so that if you really guess wrong 10 times in a row, you’re not banned forever.

The bouncers use this list to take action, like a firewall block, and will unblock you after the expiration.

Collections

Parsers and Scenarios work best when they work together so they are usually distributed together as a Collection. You can have collections of collections as well. For example, the base installation comes with the linux collection that includes a few parsers and the sshd collection.

To see what Collections, Parsers and Scenarios are running, use the cscli command line interface.

sudo cscli collections list
sudo cscli collections inspect crowdsecurity/linux
sudo cscli collections inspect crowdsecurity/sshd

Inspecting the collection will tell you what parsers and scenarios it contains. As well as some metrics. To learn more a collection and it’s components, you can check out their page:

https://hub.crowdsec.net/author/crowdsecurity/collections/linux

The metrics are a bit confusing until you learn that the ‘Unparsed’ column doesn’t mean unparsed so much as it means a non-event. These are just normal logfile lines that don’t have one of the keywords the parser was looking for, like ‘LOGIN FAIL’.

Status

Is anyone currently attacking you? The decisions list shows you any current bad actors and the alerts list shows you a summary of past decisions. If you are just getting started this is probably none, but if you’re open to the internet this will grow quickly.

sudo cscli decisions list
sudo cscli alerts list

But you are getting events from the cloud and you can check those with the -a option. You’ll notice that every 2 hours the community-blocklist is updated.

sudo cscli alerts list -a

After a while of this collection running, you’ll start to see these kinds of alerts

sudo cscli alerts list
╭────┬───────────────────┬───────────────────────────────────────────┬─────────┬────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │       value       │                  reason                   │ country │           as           │ decisions │               created_at                │
├────┼───────────────────┼───────────────────────────────────────────┼─────────┼────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 27 │ Ip:18.220.128.229 │ crowdsecurity/http-bad-user-agent         │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.948429492 +0000 UTC │
│ 26 │ Ip:18.220.128.229 │ crowdsecurity/http-path-traversal-probing │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.979479713 +0000 UTC │
│ 25 │ Ip:18.220.128.229 │ crowdsecurity/http-probing                │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.9460075 +0000 UTC   │
│ 24 │ Ip:18.220.128.229 │ crowdsecurity/http-sensitive-files        │ US      │ 16509 AMAZON-02        │ ban:1     │ 2023-03-02 13:12:27.945759433 +0000 UTC │
│ 16 │ Ip:159.223.78.147 │ crowdsecurity/http-probing                │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2023-03-01 23:03:06.818512212 +0000 UTC │
│ 15 │ Ip:159.223.78.147 │ crowdsecurity/http-sensitive-files        │ SG      │ 14061 DIGITALOCEAN-ASN │ ban:1     │ 2023-03-01 23:03:05.814690037 +0000 UTC │
╰────┴───────────────────┴───────────────────────────────────────────┴─────────┴────────────────────────┴───────────┴─────────────────────────────────────────╯

You may even need to unblock yourself

sudo cscli decisions list
sudo cscli decision delete --id XXXXXXX

Next Steps

You’re now taking advantage of the crowd-part of the crowdsec and added your own service. If you don’t have any alerts though, you may be wondering how well it’s actually working.

Take a look at the detailed activity if you want to look more closely at what’s going on.

6.1.2 - Detailed Activity

Inspecting Metrics

Data comes in through the parsers. To see what they are doing, let’s take a look at the Acquisition and Parser metrics.

sudo cscli metrics

Most of the ‘Acquisition Metrics’ lines will be read and unparsed. This is because normal events are dropped. It only considers lines parsed if they were passed on to a scenario. The ‘bucket’ column refers to event scenarios and is also blank as there were no parsed lines to hand off.

Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│         Source         │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 216        │ -            │ 216            │ -                      │
│ file:/var/log/syslog   │ 143        │ -            │ 143            │ -                      │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

The ‘Parser Metrics’ will show the individual parsers - but not all of them. Only parsers that have at least one ‘hit’ are shown. In this example, only the syslog parser shows up. It’s a low-level parser that doesn’t look for matches, so every line is a hit.

Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│             Parsers             │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/syslog-logs │ 359  │ 359    │ -        │
│ crowdsecurity/syslog-logs       │ 359  │ 359    │ -        │
╰─────────────────────────────────┴──────┴────────┴──────────╯

However, try a couple failed SSH login attemps and you’ll see them and how they feed up the the Acquistion Metrics.


Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│         Source         │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 242        │ 3            │ 239            │ -                      │
│ file:/var/log/syslog   │ 195        │ -            │ 195            │ -                      │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│             Parsers             │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/sshd-logs   │ 61   │ 3      │ 58       │
│ child-crowdsecurity/syslog-logs │ 442  │ 442    │ -        │
│ crowdsecurity/dateparse-enrich  │ 3    │ 3      │ -        │
│ crowdsecurity/geoip-enrich      │ 3    │ 3      │ -        │
│ crowdsecurity/sshd-logs         │ 8    │ 3      │ 5        │
│ crowdsecurity/syslog-logs       │ 442  │ 442    │ -        │
│ crowdsecurity/whitelists        │ 3    │ 3      │ -        │
╰─────────────────────────────────┴──────┴────────┴──────────╯

Lines poured to bucket however, is still empty. That means the scenaros decided it wasn’t a hack attempt. With SSH timeouts it actually hard to do without a tool. Plus, you may notice the ‘whitelist` was triggered. Private IP ranges are whilelisted by default so you can’t lock yourself out from inside.

Let’s ask crowdsec to explain what’s going on

Detailed Parsing

To see which parsers got involved and what they did, you can ask.

sudo cscli explain --file /var/log/auth.log --type syslog

Here’s a ssh example of a failed login. The numbers, such as (+9 ~1), mean that the parser added 9 elements it parsed from the raw event, and updated 1. Notice the whitelists parser at the end. It’s catching this event and dropping it, hense the ‘parser failure’

line: Mar  1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16  user=allen
        ├ s00-raw
        |       └ 🟢 crowdsecurity/syslog-logs (first_parser)
        ├ s01-parse
        |       └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
        ├ s02-enrich
        |       ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
        |       ├ 🟢 crowdsecurity/geoip-enrich (+9)
        |       └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
        └-------- parser failure 🔴

Why exactly did it get whitelisted? Let’s ask for a verbose report.

sudo cscli explain -v --file /var/log/auth.log --type syslog
line: Mar  1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16  user=someGuy
        ├ s00-raw
        |       └ 🟢 crowdsecurity/syslog-logs (first_parser)
        ├ s01-parse
        |       └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
        |               └ update evt.Stage : s01-parse -> s02-enrich
        |               └ create evt.Parsed.sshd_client_ip : 192.168.1.16
        |               └ create evt.Parsed.uid : 0
        |               └ create evt.Parsed.euid : 0
        |               └ create evt.Parsed.pam_type : unix
        |               └ create evt.Parsed.sshd_invalid_user : someGuy
        |               └ create evt.Meta.service : ssh
        |               └ create evt.Meta.source_ip : 192.168.1.16
        |               └ create evt.Meta.target_user : someGuy
        |               └ create evt.Meta.log_type : ssh_failed-auth
        ├ s02-enrich
        |       ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
        |               ├ create evt.Enriched.MarshaledTime : 2023-03-01T14:08:11Z
        |               ├ update evt.MarshaledTime :  -> 2023-03-01T14:08:11Z
        |               ├ create evt.Meta.timestamp : 2023-03-01T14:08:11Z
        |       ├ 🟢 crowdsecurity/geoip-enrich (+9)
        |               ├ create evt.Enriched.Longitude : 0.000000
        |               ├ create evt.Enriched.ASNNumber : 0
        |               ├ create evt.Enriched.ASNOrg : 
        |               ├ create evt.Enriched.ASNumber : 0
        |               ├ create evt.Enriched.IsInEU : false
        |               ├ create evt.Enriched.IsoCode : 
        |               ├ create evt.Enriched.Latitude : 0.000000
        |               ├ create evt.Meta.IsInEU : false
        |               ├ create evt.Meta.ASNNumber : 0
        |       └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
        |               └ update evt.Whitelisted : %!s(bool=false) -> true
        |               └ update evt.WhitelistReason :  -> private ipv4/ipv6 ip/ranges
        └-------- parser failure 🔴

This shows the actual data and at the bottom, parser crowdsecurity/whitelists has updated the property ’evt.Whitelisted’ to true and gave it a reason. That property appears to be a built-in that flags events to be dropped.

If you want to change the ranges, you can edit the logic by editing the yaml file. A sudo cscli hub list will show you what file that is. Add or remove entries from the list it’s checking the ‘ip’ valie and ‘cidr’ value against. Any match cases whitelist to become true.

False Positives

You may see a high percent of ‘Lines poured to bucket’ relative to ‘Lines read’, like in this example where almost all are. Some lines triggering two scenareos when the ‘bucket’ is greater than the number of ‘parsed’

Acquisition Metrics:
╭────────────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│             Source             │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log         │ 69         │ -            │ 69             │ -                      │
│ file:/var/log/caddy/access.log │ 2121           │ -              │ 32│ file:/var/log/syslog           │ 2          │ -            │ 2              │ -                      │
╰────────────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯

Sometimes, that’s OK as not all scenarios are designed to take instant action. The ‘http-crawl-non_statics’ had 17 events and was considering action against 2 IPs, but never ‘Overflowed’ aka took action.

The http-probing did, however. And one of the two IPs had action take against them

Bucket Metrics:
╭──────────────────────────────────────┬───────────────┬───────────┬──────────────┬────────┬─────────╮
│                Bucket                │ Current Count │ Overflows │ Instantiated │ Poured │ Expired │
├──────────────────────────────────────┼───────────────┼───────────┼──────────────┼────────┼─────────┤
│ crowdsecurity/http-crawl-non_statics │ -             │ -         │ 2            │ 17     │ 2       │
│ crowdsecurity/http-probing           │ -             │ 1         │ 2            │ 15     │ 1       │
╰──────────────────────────────────────┴───────────────┴───────────┴──────────────┴────────┴─────────╯

You can ask crowdsec to explain what’s going on with a -v and see that clients are asking for things that don’t exist.

  ├ s00-raw
  | ├ 🟢 crowdsecurity/non-syslog (first_parser)
  | └ 🔴 crowdsecurity/syslog-logs
  ├ s01-parse
  | └ 🟢 crowdsecurity/caddy-logs (+19 ~2)
  |   └ update evt.Stage : s01-parse -> s02-enrich
  |   └ create evt.Parsed.request : /0/icon/Forman,%20M.L.%20
  |   ...
  |   └ create evt.Meta.http_status : 404
  |   ...
  ├-------- parser success 🟢
  ├ Scenarios
    ├ 🟢 crowdsecurity/http-crawl-non_statics
    └ 🟢 crowdsecurity/http-probing

If you look at the rules (sudo cscli hub list) for http-probing, you’ll see it looks for 404s (file not found). If you get more than 10 in 10 seconds, it ‘overflows’ and the IP get baned.

Whitelist

The trouble is, some web apps generate a lot of 404s as they try and load page elements in case they exist. This generates lots of 404s and bans. In this case, we must whitelist the application with an expression that checks to see if it was an icon request, like above.

sudo vi /etc/crowdsec/parsers/s02-enrich/some-app-whitelist.yaml
name: crowdsecurity/whitelists 
description: "Whitelist 404s for icon requests" 
whitelist: 
  reason: "icon request" 
  expression:   
    - evt.Parsed.request startsWith '/0/icon/'

6.1.3 - Custom Parser

When checking out the detailed metrics you may find that log entries aren’t being parsed. Maybe the log format has changed or you’re logging additional data the author didn’t anticipate. The best thing is to add your own parser.

Types of Parsers

There are several type of parsers and they are used in stages. Some are designed to work with the raw log entries while others are designed to take pre-parsed data and add or enrich it. This way you can do branching and not every parser needs to now how to read a syslog message.

Their Local Path will tell you what stage they kick in at. Use sudo cscli parsers list to display the details. s00-raw works with the ‘raw’ files while s01 and s02 work further down the pipeline. Currently, you can only create s00 and s01 level parsers.

Integrating with Scenarios

Useful parsers supply data that Scenarios are interested in. You can create a parser that watches the system logs for ‘FOOBAR’ entries, extracts the ‘FOOBAR-LEVEL`, and passes it on. But if nothing is looking for ‘FOOBARs’ then nothing will happen.

Let’s say you’ve added the Caddy collection. It’s pulled in a bunch of Scenarios you can view with sudo cscli scenarios list. If you look at one of the assicated files you’ll see a filter section where they look for ’evt.Meta.http_path’ and ’evt.Parsed.verb’. They are all different though, so how do you know what data to supply?

Your best bet is to take an existing parser and modify it.

Examples

Note - CrowdSec is pretty awesome and after talking in the discord they’ve already accomodated both these scenarios within a relase cycle or two. So these two examples are solved. I’m sure you’ll find new ones, though ;-)

A Web Example

Let’s say that you’ve installed the Caddy collection, but you’ve noticed basic auth login failures don’t trigger the parser. So let’s add a new file and edit it.

sudo cp /etc/crowdsec/parsers/s01-parse/caddy-logs.yaml /etc/crowdsec/parsers/s01-parse/caddy-logs-custom.yaml

You’ll notice two top level sections where the parsing happens; nodes and statics and some grok pattern matching going on.

Nodes allow you try multiple patterns and if any match, the whole section is considered successful. I.e. if the log could have either the standard HTTPDATE or a CUSTOMDATE, as long as it has one it’s good and the matching can move on. Statics just goes down the list extracting data. If any fail the whole event is considered a fail and dropped as unparseable.

All the pasrsed data gets attached to event as ’evt.Parsed.something’ and some of the statics are moving it to evt values the Senarios will be looking for Caddy logs are JSON formatted and so basically already parsed and this example makes use of the JsonExtract method quite a bit.

# We added the caddy logs in the acquis.yaml file with the label 'caddy' and so we use that as our filter here
filter: "evt.Parsed.program startsWith 'caddy'"
onsuccess: next_stage
# debug: true
name: caddy-logs-custom
description: "Parse custom caddy logs"
pattern_syntax:
 CUSTOMDATE: '%{DAY:day}, %{MONTHDAY:monthday} %{MONTH:month} %{YEAR:year} %{TIME:time} %{WORD:tz}'
nodes:
  - nodes:
    - grok:
        pattern: '%{NOTSPACE} %{NOTSPACE} %{NOTSPACE} \[%{HTTPDATE:timestamp}\]%{DATA}'
        expression: JsonExtract(evt.Line.Raw, "common_log")
        statics:
          - target: evt.StrTime
            expression: evt.Parsed.timestamp
    - grok:
        pattern: "%{CUSTOMDATE:timestamp}"
        expression: JsonExtract(evt.Line.Raw, "resp_headers.Date[0]")
        statics:
          - target: evt.StrTime
            expression: evt.Parsed.day + " " + evt.Parsed.month + " " + evt.Parsed.monthday + " " + evt.Parsed.time + ".000000" + " " + evt.Parsed.year
    - grok:
        pattern: '%{IPORHOST:remote_addr}:%{NUMBER}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_addr")
    - grok:
        pattern: '%{IPORHOST:remote_ip}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_ip")
    - grok:
        pattern: '\["%{NOTDQUOTE:http_user_agent}\"]'
        expression: JsonExtract(evt.Line.Raw, "request.headers.User-Agent")
statics:
  - meta: log_type
    value: http_access-log
  - meta: service
    value: http
  - meta: source_ip
    expression: evt.Parsed.remote_addr
  - meta: source_ip
    expression: evt.Parsed.remote_ip
  - meta: http_status
    expression: JsonExtract(evt.Line.Raw, "status")
  - meta: http_path
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - target: evt.Parsed.request #Add for http-logs enricher
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - parsed: verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_user_agent
    expression: evt.Parsed.http_user_agent
  - meta: target_fqdn
    expression: JsonExtract(evt.Line.Raw, "request.host")
  - meta: sub_type
    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

The very last line is where a status 401 is checked. It looks for a 401 and a request for Basic auth. However, this misses events where someone asks for a resource that is protected and the serer responds telling you Basic is needed. I.e. when a bot is poking at URLs on your server ignoring the prompts to login. You can look at the log entries more easily with this command to follow the log and decode it while you recreate failed attempts.

sudo tail -f /var/log/caddy/access.log | jq

To change this, update the expression to also check the response header with an additional ? (or) condition.

    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"xtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'resp_headers.Www-Authenticate[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

Syslog Example

Let’s say you’re using dropbear and failed logins are not being picked up by the ssh parser

To see what’s going on, you use the crowdsec command line interface. The shell command is cscli and you can ask it about it’s metrics to see how many lines it’s parsed and if any of them are suspicious. Since we just restarted, you may not have any syslog lines yet, so let’s add some and check.

ssh [email protected]
logger "This is an innocuous message"

cscli metrics
INFO[28-06-2022 02:41:33 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
|         SOURCE         | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 1          | -            | 1              | -                      |
+------------------------+------------+--------------+----------------+------------------------+

Notice that the line we just read is unparsed and that’s OK. That just means it wasn’t an entry the parser cared about. Let’s see if it responds to an actual failed login.

dbclient some.remote.host

# Enter some bad passwords and then exit with a Ctrl-C. Remember, localhost attempts are whitelisted so you must be remote.
[email protected]'s password:
[email protected]'s password:

cscli metrics
INFO[28-06-2022 02:49:51 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
|         SOURCE         | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 7          | -            | 7              | -                      |
+------------------------+------------+--------------+----------------+------------------------+

Well, no luck. We will need to adjust the parser

sudo cp /etc/crowdsec/parsers/s01-parse/sshd-logs.yaml /etc/crowdsec/parsers/s01-parse/sshd-logs-custom.yaml

Take a look at the logfile and copy an example line over to https://grokdebugger.com/. Use a pattern like

Bad PAM password attempt for '%{DATA:user}' from %{IP:source_ip}:%{INT:port}

Assuming you get the pattern worked out, you can then add a section to the bottom of the custom log file you created.

  - grok:
      name: "SSHD_AUTH_FAIL"
      pattern: "Login attempt for nonexistent user from %{IP:source_ip}:%{INT:port}"
      apply_on: message

6.1.4 - On Alpine

Install

There are some packages available, but (as of 2022) they are a bit behind and don’t include the config and service files. So let’s download the latest binaries from Crowsec and create our own.

Download the current release

Note: Download the static versions. Alpine uses a differnt libc than other distros.

cd /tmp
wget https://github.com/crowdsecurity/crowdsec/releases/latest/download/crowdsec-release-static.tgz
wget https://github.com/crowdsecurity/cs-firewall-bouncer/releases/latest/download/crowdsec-firewall-bouncer.tgz

tar xzf crowdsec-firewall*
tar xzf crowdsec-release*
rm *.tgz

Install Crowdsec and Register with The Central API

You cannot use the wizard as it expects systemd and doesn’t support OpenRC. Follow the Binary Install steps from CrowdSec’s binary instrcutions.

sudo apk add bash newt envsubst
cd /tmp/crowdsec-v*

# Docker mode skips configuring systemd
sudo ./wizard.sh --docker-mode

sudo cscli hub update
sudo cscli machines add -a
sudo cscli capi register

# A collection is just a bunch of parsers and scenarios bundled together for convienence
sudo cscli collections install crowdsecurity/linux 

Install The Firewall Bouncer

We need a netfilter tool so install nftables. If you already have iptables installed you can skip this step and set FW_BACKEND to that below when generating the API keys.

sudo apk add nftables

Now we install the firewall bouncer. There is no static build of the firewall bouncer yet from CrowdSec, but you can get one from Alpine testing (if you don’t want to compile it yourself)

# Change from 'edge' to other versions a needed
echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
apk update
apk add cs-firewall-bouncer

Now configure the bouncer. We will once again do this manually becase there is not support for non-systemd linuxes with the install script. But cribbing from their install script, we see we can:

cd /tmp/crowdsec-firewall*

BIN_PATH_INSTALLED="/usr/local/bin/crowdsec-firewall-bouncer"
BIN_PATH="./crowdsec-firewall-bouncer"
sudo install -v -m 755 -D "${BIN_PATH}" "${BIN_PATH_INSTALLED}"

CONFIG_DIR="/etc/crowdsec/bouncers/"
sudo mkdir -p "${CONFIG_DIR}"
sudo install -m 0600 "./config/crowdsec-firewall-bouncer.yaml" "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"

Generate The API Keys

Note: If you used the APK, just do the first two lines to get the API_KEY (echo $API_KEY) and manually edit the file (vim /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml)

cd /tmp/crowdsec-firewall*
CONFIG_DIR="/etc/crowdsec/bouncers/"

SUFFIX=`tr -dc A-Za-z0-9 </dev/urandom | head -c 8`
API_KEY=`sudo cscli bouncers add cs-firewall-bouncer-${SUFFIX} -o raw`
FW_BACKEND="nftables"
API_KEY=${API_KEY} BACKEND=${FW_BACKEND} envsubst < ./config/crowdsec-firewall-bouncer.yaml | sudo install -m 0600 /dev/stdin "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"

Create RC Service Files

sudo touch /etc/init.d/crowdsec
sudo chmod +x /etc/init.d/crowdsec
sudo rc-update add crowdsec

sudo vim /etc/init.d/crowdsec
#!/sbin/openrc-run

command=/usr/local/bin/crowdsec
command_background=true

pidfile="/run/${RC_SVCNAME}.pid"

depend() {
   need localmount
   need net
}

Note: If you used the package from Alpine testing above it came with a service file. Just rc-update add cs-firewall-bouncer and skip this next step.

sudo touch /etc/init.d/cs-firewall-bouncer
sudo chmod +x /etc/init.d/cs-firewall-bouncer
sudo rc-update add cs-firewall-bouncer

sudo vim /etc/init.d/cs-firewall-bouncer
#!/sbin/openrc-run

command=/usr/local/bin/crowdsec-firewall-bouncer
command_args="-c /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml"
pidfile="/run/${RC_SVCNAME}.pid"
command_background=true

depend() {
  after firewall
}

Start The Services and Observe The Results

Start up the services and view the logs to see that everything started properly

sudo service start crowdsec
sudo service cs-firewall-bouncer status

sudo tail /var/log/crowdsec.log
sudo tail /var/log/crowdsec-firewall-bouncer.log

# The firewall bouncer should tell you about how it's inserting decisions it got from the hub

sudo cat /var/log/crowdsec-firewall-bouncer.log

time="28-06-2022 13:10:05" level=info msg="backend type : nftables"
time="28-06-2022 13:10:05" level=info msg="nftables initiated"
time="28-06-2022 13:10:05" level=info msg="Processing new and deleted decisions . . ."
time="28-06-2022 14:35:35" level=info msg="100 decisions added"
time="28-06-2022 14:35:45" level=info msg="1150 decisions added"
...
...

# If you are curious about what it's blocking
sudo nft list table crowdsec
...

7 - Storage

7.1 - Seafile

TODO - seafile 11 is in beta and mysql is required.

Seafile is a cloud storage system, similar to google drive. It stands out for being simpler and faster than it’s peers. It’s also open source.

Preparation

You’ll need a linux server. We use Debian 12 in this example and instructions are based on Seafile’s SQLite instructions, updated for the new OS.

cffi build issues[^cffi],

and a python virtual environement so apt and pip packages play nice.

# The main requirements
sudo apt install -y memcached libmemcached-dev pwgen sqlite3
sudo systemctl enable --now memcached

# Python specific things
sudo apt install -y python3 python3-setuptools python3-pip 


sudo apt install python3-wheel python3-django python3-django-captcha python3-future python3-willow python3-pylibmc python3-jinja2 python3-psd-tools python3-pycryptodome python3-cffi



# cffi build requirements
sudo apt install -y build-essential libssl-dev libffi-dev python-dev-is-python3

# Install the service account and create a python virtual environment for them
sudo apt install python3-venv
sudo useradd --home-dir /opt/seafile --system --comment "Seafile Service Account" --create-home seafile
sudo -i -u seafile
python3 -m venv .venv
source .venv/bin/activate

# Install the rest of the packages from pip
pip3 install --timeout=3600 \
  wheel django django-pylibmc django-simple-captcha future \
  Pillow pylibmc captcha jinja2 psd-tools pycryptodome cffi

192.168.1.21:/srv/seafile /srv/seafile nfs defaults,noatime,vers=4.1 0 0

Installation

It comes with two services. Seafile, the file sync server, and Seahub, a web interface and editor.

For a small team, you can install a lightweight instance of Seafile using a single host and sqlite.

Note: There is a seafile repo, but it may be [client] only. TODO test this

As per the install [instructions] this will create several folders in seafile’s home directory and a symlink to the binaries in a version specific directory for easy upgrades.

# Contine as the seafile user - the python venv should still be in effect. If not, source as before

# Downlaod and exract the binary
wget -P /tmp https://s3.eu-central-1.amazonaws.com/download.seadrive.org/seafile-server_10.0.1_x86-64.tar.gz
tar -xzf /tmp/seafile-server_10.0.1_x86-64.tar.gz -C /opt/seafile/
rm /tmp/seafile*

# Run the setup script
cd /opt/seafile/sea*
./setup-seafile.sh

# Start seafile and seahub to answer some setup questions
./seafile.sh start
./seahub.sh start

./seahub.sh stop
./seafile.sh stop

Create systemd service files1 for the two services. (as a sudo capable user)

sudo tee /etc/systemd/system/seafile.service << EOF
[Unit]
Description=Seafile
After=network.target

[Service]
Type=forking
ExecStart=/opt/seafile/seafile-server-latest/seafile.sh start
ExecStop=/opt/seafile/seafile-server-latest/seafile.sh stop
LimitNOFILE=infinity
User=seafile
Group=seafile

[Install]
WantedBy=multi-user.target
EOF

Note: The ExecStart below is a bit cumbersome, but it saves modifying the vendor’s start script. Only the Seahub service seems to need the virtual env, though you can give both services the same treatment if you wish.

sudo tee /etc/systemd/system/seahub.service << EOF
[Unit]
Description=Seafile hub
After=network.target seafile.service

[Service]
Type=forking
ExecStart=/bin/bash -c 'source /opt/seafile/.venv/bin/activate && /opt/seafile/seafile-server-latest/seahub.sh start'
ExecStop=/bin/bash -c 'source /opt/seafile/.venv/bin/activate && /opt/seafile/seafile-server-latest/seahub.sh stop'
User=seafile
Group=seafile

[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now seafile.service
sudo systemctl enable --now seahub.service

Seafile and Seahub should have started without error, though by default you can only access it from locahost.

If you run into problems here make sure to start Seafile first. Expiriment with sourcing the activation file as the seafile user and running the start script directly.

Add logrotation

sudo tee /etc/logrotate.d/seafile << EOF
/opt/seafile/logs/seafile.log
/opt/seafile/logs/seahub.log
/opt/seafile/logs/file_updates_sender.log
/opt/seafile/logs/repo_old_file_auto_del_scan.log
/opt/seafile/logs/seahub_email_sender.log
/opt/seafile/logs/work_weixin_notice_sender.log
/opt/seafile/logs/index.log
/opt/seafile/logs/content_scan.log
/opt/seafile/logs/fileserver-access.log
/opt/seafile/logs/fileserver-error.log
/opt/seafile/logs/fileserver.log
{
        daily
        missingok
        rotate 7
        # compress
        # delaycompress
        dateext
        dateformat .%Y-%m-%d
        notifempty
        # create 644 root root
        sharedscripts
        postrotate
                if [ -f /opt/seafile/pids/seaf-server.pid ]; then
                        kill -USR1 `cat /opt/seafile/pids/seaf-server.pid`
                fi

                if [ -f /opt/seafile/pids/fileserver.pid ]; then
                        kill -USR1 `cat /opt/seafile/pids/fileserver.pid`
                fi

                if [ -f /opt/seafile/pids/seahub.pid ]; then
                        kill -HUP `cat /opt/seafile/pids/seahub.pid`
                fi

                find /opt/seafile/logs/ -mtime +7 -name "*.log*" -exec rm -f {} \;
        endscript
}
EOF

Configuration

Seahub (the web UI) by default is bound to localhost only. Change that to all addresses so you can access it from other systems.

sudo sed -i 's/^bind.*/bind = "0.0.0.0:8000"/'  /opt/seafile/conf/gunicorn.conf.py

If you’re not proxying already, check the seahub settings. You may need to add the correct internal name and port for ititial access. You should add the file server root as well so you don’t have to add it in the GUI later.

vi /opt/seafile/conf/seahub_settings.py

SERVICE_URL = "http://seafile.some.lan:8000/" 
FILE_SERVER_ROOT = "http://seafile.some.lan:8082"

Add a connection to the memcache server

sudo tee -a /opt/seafile/conf/seahub_settings.py << EOF
CACHES = {
    'default': {
        'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',
        'LOCATION': '127.0.0.1:11211',
    },
}
EOF

And restart to take affect

sudo systemctl restart seahub

You should now be able to login at http://some.server:8000/ with the credentials you created during the command line setup. If the web GUI works, but you can’t download files or the markdown editor doesn’t work as expected, check the FILE_SERVER_ROOT and look in the GUI’s System Admin section at those settings.

NFS Mount

Large amounts of data are best handled by a dedicated storage system and those are usually mounted over the network via NFS or a similar protocol. Seafile data should be stored in such a system, but you cannot mount the entire Seafile data folder over the network as it includes SQLite data that recommends2 against that. Nor can you mount each subdirectory seperately as they rely upon internal links that must be on the same filesystem.

The solution is to mount a network share in an alternate location and symlink the relative parts of the Seafile data directory to it.

sudo mount nfs.server:/exports/seafile /mnt/seafile

sudo systemctl stop seahub
sudo systemctl stop seafile

sudo mv /opt/seafile/seafile-data/httptemp \
	/opt/seafile/seafile-data/storage \
	/opt/seafile/seafile-data/tmpfiles \
/mnt/seafile/

sudo ln -s /mnt/seafile/httptemp /opt/seafile/seafile-data/
sudo ln -s /mnt/seafile/storage /opt/seafile/seafile-data/
sudo ln -s /mnt/seafile/tmpfiles /opt/seafile/seafile-data/

sudo chown -R seafile:seafile /mnt/seafile

Proxy

Say something about why caddy, then give the proxy file, then say HTTP/3 and enabling UDP 443 and seeing it in the logs. with firefox enabled. No special server config.

https://caddy.community/t/caddy-v2-and-seafile-server-on-a-root-server/9188/2

Note the change in the GUI for the 8082

https://www.seafile.com/en/download/#server


  1. https://manual.seafile.com/deploy/start_seafile_at_system_bootup/ ↩︎

  2. https://www.sqlite.org/faq.html#q5 [client]:https://help.seafile.com/syncing_client/install_linux_client/ [instructions]:https://manual.seafile.com/deploy/using_sqlite/ ↩︎

7.2 - TrueNAS

7.2.1 - Disk Replacement

Locate the failed drive.

zpool status

It will show something like

	NAME                                        STATE     READ WRITE CKSUM
	pool01                                      DEGRADED     0     0     0
	  raidz3-0                                  ONLINE       0     0     0
	    44fca0d1-f343-48e6-9a43-c71463551aa4    ONLINE       0     0     0
	    7ca5e989-51a5-4f1b-a81e-982d9a05ac04    ONLINE       0     0     0
	    8fd249a0-c8c6-47bb-8787-3e246300c62d    ONLINE       0     0     0
	    573c1117-27d4-430c-b57c-858a75b4ca35    ONLINE       0     0     0
	    29b7c608-72ae-4ec2-830b-0e23925ac0b1    ONLINE       0     0     0
	    293acdbe-6be5-4fa7-945a-e9481b09c0fa    ONLINE       0     0     0
	    437bac45-433b-48e3-bc70-ae1c82e8155b    ONLINE       0     0     0
	    a5ca09a7-3f3f-4135-a2d9-71290fd79160    ONLINE       3     2     0
	  raidz3-1                                  DEGRADED     0     0     0
	    spare-0                                 DEGRADED     0     0     0
	      65f61699-e2fc-4a36-86dd-b0fa6a774798  FAULTED     53     0     0  too many errors
	      9d794dfd-2ef6-432d-8252-0c93e79509dc  ONLINE       0     0     0
	    e27f31e8-a1a4-47dc-ac01-4a6c99b6e5d0    ONLINE       0     0     0
	    aff60721-21ae-42bf-b077-1937aeafaab2    ONLINE       0     0     0
	    714da3e5-ca9c-43d0-a0f3-c0fa693a5b02    ONLINE       0     0     0
	    df89869a-4445-47f9-afa9-3b9cce3b1530    ONLINE       0     0     0
	    29748037-bbd5-4f2d-8878-4fa2b81d9ec3    ONLINE       0     0     0
	    1ff396ec-dec7-45dd-9172-de31e5f6fca7    ONLINE       0     0     0

Off-line the drive.

zpool offline pool01 65f61699-e2fc-4a36-86dd-b0fa6a77479

Get the serial number

hdparm -I /dev/disk/by-partuuid/65f61699-e2fc-4a36-86dd-b0fa6a774798 | grep Serial

The output will be something like

Serial Number:      ZC1168HE
Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

Identify the bay location

sas3ircu 0 display | grep -B 10 ZC1168HE                                          

The output will look like

  Device is a Hard disk
    Enclosure #                             : 2
    Slot #                                  : 17

Turn on the bay indicator

sas3ircu 0 locate 2:17 ON

Physically replace the disk

Check the logs for the new disk’s name

dmesg

The output will indicate the device id, such as ‘sdal’ in the below example

  [16325935.447081] sd 0:0:45:0: Power-on or device reset occurred
  [16325935.447962] sd 0:0:45:0: Attached scsi generic sg20 type 0
  [16325935.451271]  end_device-0:0:28: add: handle(0x001c), sas_addr(0x500304801810f321)
  [16325935.454768] sd 0:0:45:0: [sdal] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
  [16325935.477576] sd 0:0:45:0: [sdal] Write Protect is off
  [16325935.479913] sd 0:0:45:0: [sdal] Mode Sense: 9b 00 10 08
  [16325935.482100] sd 0:0:45:0: [sdal] Write cache: enabled, read cache: enabled, supports DPO and FUA
  [16325935.664995] sd 0:0:45:0: [sdal] Attached SCSI disk

Turn off the slot light

sas3ircu 0 locate 2:17 OFF

Use the GUI to replace the disk. (Use the GUI over the cmd lie to ensure it’s setup consistently with the other disks)

  Storage --> Pool Gear Icon (at right) --> Status

    (The removed disk  should be listed bu it's UUID)

  Disk Menu (three dots) --> Replace --> (disk from dmesg above) --> Force --> Replace Disk

After resilvering has finished, check the spare’s ID at the bottom and then detach it so it goes back to spare

zpool detach pool01 9d794dfd-2ef6-432d-8252-0c93e79509dc

Notes:

Note: The GUI takes several steps to prepare the disk and adds a partition to the pool, not the whole disk. It’s ‘strongly advised against’ using the CLI to replace the disk. Though if you must, you can recreate that process at the command line. as adapted from https://www.truenas.com/community/resources/creating-a-degraded-pool.100/

gpart and glable are not present on TrueNAS Scale, so you would have to adapt this to another tool

gpart create -s gpt /dev/da18
gpart add -i 1 -b 128 -t freebsd-swap -s 2g /dev/da18
gpart add -i 2 -t freebsd-zfs /dev/da18

zpool replace pool01 65f61699-e2fc-4a36-86dd-b0fa6a77479

To turn off all slot lights

for X in {0..23};do echo sas3ircu 0 locate 2:$X OFF;done
for X in {0..11};do sas3ircu 0 locate 3:$X OFF;done