This is the documentation root. Use the left-hand nav bar to descend taxonomically, or use the search to find what you are after.
This is the multi-page printable view of this section. Click here to print.
Documentation
- 1: Internet
- 1.1: DNS
- 1.1.1: Pi-hole
- 1.1.2: Pi-hole DHCP
- 1.1.3: Unbound
- 1.2: Email
- 1.2.1: Forwarding
- 1.2.2: Remote Hosting
- 1.2.3: Self Hosting
- 1.2.3.1: Postfix
- 1.2.3.2: Relay
- 1.2.3.3: Dovecot
- 1.2.3.4: Security
- 1.2.3.5: Authentication
- 1.2.3.6: Autodiscovery
- 1.3: Web
- 1.3.1: Content Mgmt
- 1.3.1.1: Hugo
- 1.3.1.1.1: Hugo Install
- 1.3.1.1.2: Docsy Install
- 1.3.1.1.3: Docsy Config
- 1.3.1.1.4: Docsy Operate
- 1.3.1.1.5: Docsy Github
- 1.3.2: Content Deployment
- 1.3.2.1: Local Git Deployment
- 1.3.3: Content Delivery
- 1.3.3.1: Cloudflare
- 1.3.4: Servers
- 1.3.4.1: Caddy
- 1.3.4.1.1: Installation
- 1.3.4.1.2: WebDAV
- 1.3.4.1.3: MFA
- 1.3.4.1.4: Wildcard DNS
- 2: Media
- 2.1: Players
- 2.1.1: LibreELEC
- 2.2: Signage
- 2.2.1: Anthias (Screenly)
- 2.2.2: Anthias Deployment
- 2.2.3: API
- 3: Monitoring
- 3.1: Metrics
- 3.1.1: Prometheus
- 3.1.1.1: Installation
- 3.1.1.2: Node Exporter
- 3.1.1.3: SNMP Exporter
- 3.1.1.4: SNMP Generator
- 3.2: Logs
- 3.3: Visualization
- 3.3.1: Grafana
- 4: Network
- 4.1: Routing
- 4.1.1: Linux Router
- 4.1.2: OPNsense
- 4.2: VPN
- 4.2.1: Wireguard
- 4.2.1.1: Central Server
- 4.2.1.2: Remote Access
- 4.2.1.3: Remote Mgmt
- 4.2.1.4: Routing
- 4.2.1.5: LibreELEC
- 4.2.1.6: TrueNAS Scale
- 4.2.1.7: Proxmox
- 5: Operating Systems
- 5.1: NetBoot
- 5.1.1: HTTP Boot
- 5.1.2: PXE Boot
- 5.1.3: menu
- 5.1.4: netboot.xyz
- 5.1.5: windows
- 5.2: Server Core
- 5.3: Virtualization
- 5.3.1: Incus
- 5.4: Zero Touch Install
- 6: Security
- 6.1: CrowdSec
- 6.1.1: Installation
- 6.1.2: Detailed Activity
- 6.1.3: Custom Parser
- 6.1.4: On Alpine
- 7: Storage
- 7.1: Seafile
- 7.2: TrueNAS
- 7.2.1: Disk Replacement
1 - Internet
1.1 - DNS
Web pages today are complex. Your browser will make on average 401 DNS queries to find the various parts of an average web page, so implementing a local DNS system is key to keeping things fast.
In general, you can implement either a caching or recursive server with the choice between speed vs privacy.
Types of DNS Servers
A caching server accepts and caches queries, but doesn’t actually do the lookup itself. It forwards the request on to another DNS server and waits for the answer. If you have a lot of clients configured to use it, chances are someone else has already asked for what you want and it can supply the answer quickly from cache.
A recursive server does more than just cache answers. It knows how to connect to the root of the internet and find out itself. If you need to find some.test.com, it will connect to the .com server, ask where test.com is, then connect to test.com and ask it for some.
Comparison
Between the two, the caching server will generally be faster. If you connect to a large DNS service they will almost always have things cached. You will also get geographically relevant results as content providers work with DNS providers to direct you to the closest content cache.
With a recursive server, you do the lookup yourself and no single entity is able to monitor your DNS queries. You also aren’t dependant upon any upstream provider. But you make every lookup ’the long way’, and that can take many hundreds of milliseconds on some cases, a large part of a page load time.
Testing
In an ad hoc test on a live network with about 5,000 residential user devices, about half the queries were cached. The other half were sent to either quad 9 or a local resolver. Quad 9 took about half the time that the local resolver did.
Here are the numbers - with Steve Gibson’s DNS benchmarker against pi-hole forwarding to a local resolver vs pi-hole forwarding to quad 9. Cached results excluded.
Forwarder | Min | Avg | Max |Std.Dev|Reliab%|
----------------+-------+-------+-------+-------+-------+
- Uncached Name | 0.015 | 0.045 | 0.214 | 0.046 | 100.0 |
- DotCom Lookup | 0.015 | 0.019 | 0.034 | 0.005 | 100.0 |
---<O-OO---->---+-------+-------+-------+-------+-------+
Resolver | Min | Avg | Max |Std.Dev|Reliab%|
----------------+-------+-------+-------+-------+-------+
- Uncached Name | 0.016 | 0.078 | 0.268 | 0.079 | 100.0 |
- DotCom Lookup | 0.018 | 0.035 | 0.078 | 0.017 | 100.0 |
---<O-OO---->---+-------+-------+-------+-------+-------+
Selection
This test is interesting, but not definitive. While the DNS benchmark shows that the uncached average is better, page load perception is different than the sum of DNS queries. A page metric test would be good, but in general, faster is better.
Use a caching server.
One last point: use your ISP’s name server when possible. They will direct you to their local content caching systems for Netflix, Google (YouTube) and Akamai. If you use quad 9 like I did, you may get to a regional content location, but you miss out on things optimized specifically for your geographic location.
They are (probably) capturing all your queries for monetization, and possibly directing you to their own their own advertising server when you mis-key in a domain name. So you’ll need to decide;
Speed vs privacy.
-
Informal personal checking of random popular sites. ↩︎
1.1.1 - Pi-hole
Pi-hole is reasonable choice for DNS service, especially if you don’t have a separate metrics and reporting system. A single instance will scale to 1000 active clients with just 1 core and 500M RAM and do a good job showing what’s going on.
There are some caveats when you pass 1000 users when logging all queries, but it’s a
Preparation
Prepare and secure a Debian system
Set a Static Address
sudo vi /etc/network/interfaces
Change
# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp
to
auto eth0
iface eth0 inet static
address 192.168.0.2/24
gateway 192.168.0.1
Secure Access with Nftables
Nftables is the modern replacement for iptables and preferred for setting netfilter rules.
sudo apt install nftables
sudo systemctl enable nftables
sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0;
# accept any localhost traffic
iif lo accept
# accept already allowed and related traffic
ct state established,related accept
# accept DNS and DHCP traffic from internal only
define RFC1918 = { 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12 }
udp dport { domain, bootps } ip saddr $RFC1918 ct state new accept
tcp dport { domain, bootps } ip saddr $RFC1918 ct state new accept
# accept web and ssh traffic on the first interface or from an addr range
iifname eth0 tcp dport { ssh, http } ct state new accept
# or
ip saddr 192.168.0.1/24 ct state new accept
# Accept pings
icmp type { echo-request } ct state new accept
# accept neighbor discovery otherwise IPv6 connectivity breaks.
ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
# count other traffic that does match the above that's dropped
counter drop
}
}
sudo nft -f /etc/nftables.conf
sudo systemctl start nftables.service
Add Unattended Updates
This an optional, but useful service.
apt install unattended-upgrades
sudo sed -i 's/\/\/\(.*origin=Debian.*\)/ \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";\)/ \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies\) "false";/ \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/\/\/\(Unattended-Upgrade::Automatic-Reboot\) "false";/ \1 "true";/' /etc/apt/apt.conf.d/50unattended-upgrades
Installation
Unbound
sudo apt install unbound
Pi-hole
sudo apt install curl
curl -sSL https://install.pi-hole.net | bash
Configuration
Unbound
The pi-hole guide for [unbound]:(https://docs.pi-hole.net/guides/dns/unbound/) includes a config block to copy and paste as directed. You should also add a config file for dnsmasq while you’re at it, to set EDNS packet sizes. (dnsmasq comes as part of pi-hole)
sudo vi /etc/dnsmasq.d/99-edns.conf
edns-packet-max=1232
When you check the status of unbound, you can ignore the warning: subnetcache:...
as it’s just reminding you that data in the subnet cache (if you were to use it) can’t pre-fetched. There’s some conversation1 as to why it’s warning us.
The config includes prefetch
, but you may also wish to add serve-expired
to the same config file from above.
# serve old responses from cache while waiting for the actual resolution to finish.
serve-expired: yes
sudo systemctl restart unbound.service
No additional setup is needed, but see the unbound page for more info.
Pi-hole
Pi-hole can be configured via it’s two main config files, /etc/pihole/setupVars.cong
and pihole-FTL.conf
, but it’s convenient to use the GUI’s left-hand settings menu.
- Settings -> DNS -> Upstream DNS Servers -> Custom 1 (Check and add 127.0.0.1#5335 as shown in the unbound guide linked above)
- Settings -> DNS -> Interface settings -> Permit all origins (needed if you have multiple networks)
Very busy pi-hole installations generate lots of data and (seemingly) hang the dashboard. If that happens, limit the about of data being displayed.
vi /etc/pihole/pihole-FTL.conf
# Don't import the existing DB into the GUI - it will hang the web page for a long time
DBIMPORT=no
# Don't import more than an hour of logs from the logfile
MAXLOGAGE=1
# Truncate data older than this many days to keep the size of the database down
MAXDBDAYS=1
sudo systemctl restart pihole-FTL.service
Operation
Local DNS Entries
You can enter local DNS and CNAME entries via the GUI, (Admin Panel -> Local DNS), but you can also edit the config file for bulk entries.
For A records
vim /etc/pihole/custom.list
10.50.85.2 test.some.lan
10.50.85.3 test2.some.lan
For CNAME records
vim /etc/dnsmasq.d/05-pihole-custom-cname.conf
cname=test3.some.lan,test.some.lan
Block Lists
Pi-hole ships with one ad list; StevenBlack. You may need to disable this for google or facebook search results to work as expected. The top search results are often ads and don’t work as expected when pi-hole is blocking them.
- Admin Panel -> Ad Lists -> Status Column
You might consider adding security only lists instead, such as Intel’s below
Search the web for other examples.
Upgrading
Unbound will be upgraded via the Unattended Upgrades service. But pi-hole requires a manual command.
sudo pihole -up
Troubleshooting
DNS Cache Size
The default cache size of 10,000 serves thousands clients easily. This is because entries expire faster than the cache runs out. But you can check your evictions - cache entries removed to make room before they expire - to see.
settings -> System -> DNS cache evictions:
You’ll notice that insertions keep climbing as things are added to the cache, but the cache number itself represents only those entries that are current. If you do see evictions, edit CACHE_SIZE
in /etc/pihole/setupVars.conf
You can also check this at the command line
dig +short chaos txt evictions.bind @localhost
dig +short chaos txt cachesize.bind
dig +short chaos txt hits.bind
dig +short chaos txt misses.bind
However, we are advised that unused cache is wasted, when it could be used for disk buffers, etc. So don’t add it just in case.
Rate Limiting
The system has a default limit of 1000 queries in a 60 seconds window for each client. If your clients are proxied or relayed, you can run into this. This event is displayed in the dashbaord2 and also in the logs3.
sudo grep -i Rate-limiting /var/log/pihole/pihole.log /var/log/pihole/pihole.log
You may find the address 127.0.0.1 being rate limited. This can be due to pi-hole doing a reverse of all client IPs every hour. You can disable this with:
# In the pihole-FTL.conf
REFRESH_HOSTNAMES=NONE
DNS over HTTP
Firefox, if the user has not yet chosen a setting, will query use-application-dns.net
. Pi-hole respods with NXDOMAIN4 as a signal to use pi-hole for DNS.
/etc/pihole/pihole-FTL.conf
Apple devices include a private relay5 that the user may decide to enable if they pay for it. Pi-hole by default blocks queries for mask.icloud.com
and the user will be notified you are blocking it.
# Signal that Apple iCloud Private Relay is allowed
BLOCK_ICLOUD_PR=false
sudo systemctl reload pihole-FTL.service
Searching The Query Log Hangs DNS
On a very busy server, clicking show-all
in the query log panel will hang the server as pihole-FTL works through it’s database. There is no solution, just don’t do it. The best alternative is to ship logs to a Elasticsearch or similar system.
Ask Yourself
The system continues to use whatever DNS resolver was initially configured. You may want it to use itself, instead.
# revert if pi-hole itself needs fixed.
sudo vi /etc/resolv.conf
nameserver 127.0.0.1
-
https://www.reddit.com/r/pihole/comments/11xb7pt/unbound_warning_subnetcache_prefetch_and/ ↩︎
-
https://pi-hole.net/blog/2021/09/11/pi-hole-ftl-v5-9-web-v5-6-and-core-v5-4-released/#page-content ↩︎
-
https://discourse.pi-hole.net/t/include-log-entry-in-pihole-ftl-log-when-client-hits-rate-limit/46798/12A ↩︎
-
https://www.reddit.com/r/pihole/comments/113qkp5/i_am_seeing_useapplicationdnsnet_being_blocked/ ↩︎
-
https://docs.pi-hole.net/ftldns/configfile/?h=mask#icloud_private_relay ↩︎
1.1.2 - Pi-hole DHCP
Pi-hole serves up DHCP information as well as DNS, and can be configured and enabled in the GUI.
However, the GUI only allows for a single range. On a large network you’ll need multiple ranges. You do this by editing the config files directly.
Interface-Based Ranges
In this setup, you have a separate interface per LAN. Easy to do in a virtual or VLAN environment, but you’ll have to define each in the /etc/network/interfaces
file.
Let’s create a range from 192.168.0.100-200
tied to eth0
and a range of 192.168.1.100-200
tied to eth1
. We’ll also specify the router and two DNS servers.
vim /etc/dnsmasq.d/05-dhcp.conf
dhcp-range=eth0,192.168.0.100,192.168.0.200,24h
dhcp-option=ens161,option:router,192.168.0.1
dhcp-option=ens161,option:dns-server,192.168.0.2,192.168.0.3
dhcp-range=eth1,192.168.1.100,192.168.1.200,24h
dhcp-option=ens161,option:router,192.168.1.1
dhcp-option=ens161,option:dns-server,192.168.1.2,192.168.1.3
# Shared by both
dhcp-option=option:netmask,255.255.0.0
# Respond immediately without waiting for other servers
dhcp-authoritative
# Don't try and ping the address before assigning it
no-ping
dhcp-lease-max=10000
dhcp-leasefile=/etc/pihole/dhcp.leases
domain=home.lan
These settings can be implicit - i.e. we could have left out ethX
in the range, but explicit is often better for clarity.
Note - the DHCP server (dnsmasq) is not enabled by default. You can do that in the GUI under Settings –> DHCP
Relay-Based Ranges
In this setup, the router relays DHCP requests to the server. Only one system network interface is required, though you must configure the router(s).
When configured, the relay (router) sets the relay-agent (giaddr) field, sends it to dnsmasq, which (I think) understands it’s a relayed request when it sees that field, and looks at it’s available ranges for a match. It’s also sets a tag
that to be used for assigning different options, such as the gateway, per range.
dhcp-range=tag0,192.168.0.100,192.168.0.250,255.255.255.0,8h
dhcp-range=tag1,192.168.1.100,192.168.1.250,255.255.255.0,8h
dhcp-range=tag2,192.168.2.100,192.168.2.250,255.255.255.0,8h
dhcp-options=tag0,3,192.168.0.1
dhcp-options=tag1,3,192.168.1.1
dhcp-options=tag2,3,192.168.2.1
Sources
https://discourse.pi-hole.net/t/more-than-one-conditional-forwarding-entry-in-the-gui/11359
Troubleshooting
It’s possible that the DHCP part of dnsmasq doesn’t scale to many thousands of leases1
1.1.3 - Unbound
1.2 - Email
Email is a commodity service, but critical for many things - so you can get it anywhere, but you better not mess it up.
Your options, in increasing order of complexity, are:
Forwarding
Email sent to [email protected] is simply forwarded to someplace like gmail. It’s free and easy, and you don’t need any infrastructure. Most registrars like GoDaddy, NameCheap, CloudFlare, etc, will handle it.
You can even reply from [email protected] by integrating with SendGrid or a similar provider.
Remote-Hosting
If you want more, Google and Microsoft have full productivity suites. Just edit your DNS records, import your users, and pay them $5 a head per month. You still have to ‘do email’ but it’s a little less work than if you ran the whole stack. In most cases, companies that specialize in email do it better than you can.
Self-Hosting
If you are considering local email, let me paraphrase Kenji López-Alt. The first step is, don’t. The big guys can do it cheaper and better. But if it’s a philosophical, control, or you just don’t have the funding, press on.
A Note About Cost
Most of the cost is user support. Hosting means someone else gets purchase and patch a server farm, but you still have to talk to users. My (anecdotal) observation is that fully hosting saves 10% in overall costs and it soothes out expenses. The more users you have, the more that 10% starts to matter.
1.2.1 - Forwarding
This is the best solution for a small number of users. You configure it at your registrar and rely on google (or someone similar) to do all the work for free.
If you want your out-bound emails to come from your domain name (and you do), add an out-bound relay. This is also free for minimal use.
Registrar Configuration
This is different per registrar, but normally involves creating an address and it’s destination
Cloudflare
- (Login - assumes you use cloudflare as your registrar)
- Login and select the domain in question.
- Select Email, then Email Routing.
- Under Routes, select Create address.
Once validated, email will begin arriving at the destination.
Configure Relaying
The registrars is only forwarding email, not sending it. To get your sent mail to from from your domain, you must integrate with a mail service such as SendGrid
SendGrid
- Create a free account and login
- Authenticate your domain name (via DNS)
- Create an API key (Settings -> API Keys -> Restricted Access, Defaults)
Gmail
- Settings -> Accounts -> Send Mail as
- Add your domain email
- Configure the SMTP server with:
- SMTP server: “smtp.sendgrid.net”
- username: “apikey”
- password: (the key you created above)
After validating the code Gmail sends you, there will be a drop down in the From field of new emails.
1.2.2 - Remote Hosting
This is more in the software-as-a-service category. You get an admin dashboard and are responsible for managing users and mail flow. The hosting service provide will help you with basic things, but you’re doing most of the work yourself.
Having manged 100K+ user mail systems and migrated from on-prem sendmail to exchange and then O365 and Google, I can confidently say the infrastructure and even platform amounts to less than 10% of the cost of providing the service.
The main advantage to hosting is that you’re not managing the platform, installing patches and replacing hardware. The main disadvantage is is that you have little control and sometimes things are broken and you can’t do anything about it.
Medium sized organizations benefit most from hosting. You probably need a productivity suite anyways, and email is usually wrapped up in that. It saves you from having to specialize someone in email and the infrastructure associated with it.
But if controlling access to your data is paramount, then be aware that you have lost that and treat email as a public conversation.
1.2.3 - Self Hosting
When you self-host, you develop expertise in email itself, arguably a commodity service where such expertise has small return. But, you have full control and your data is your own.
The generally accepted best practice is install Postfix and Dovecot. This is the simplest path and what I cover here. But there are some pretty decent all-in-one packages such as Mailu, Modoboa, etc. These usually wrap Postfix and Dovecot to spare you the details and improve your quality of life, at the cost of not really knowing how they really work.
You’ll also need to configure a relay. Many ISPs block basic mail protocol and many recipient servers are rightly suspicious of random emails from unknown IPs in cable modem land.
1.2.3.1 - Postfix
This is the first step - a server that handles and stores email. You’ll be able to check messages locally at the console. (Remote client access such as with Thunderbird comes later.)
Preparation
You need:
- Linux Server
- Firewall Port-Forward
- Public DNS
Server
We use Debian Bookworm (12) in this example but any derivative will be similar. At large scale you’d setup virtual users, but we’ll stick with the default setup and use your system account. Budget about 10M per 100 emails stored.
Port Forwarding
Mail protocol uses port 25. Simply forward that to your internal mail server and you’re done.
DNS
You need an normal ‘A’ record for your server and a special ‘MX’ record for your domain root. That way, mail sent to [email protected] will get routed to the server.
Name | Type | Value |
---|---|---|
the-server | A | 20.236.44.162 |
@ | MX | the-server |
Mail servers see [email protected] and look for records of type ‘MX’ for ‘your.org’. Seeing that ’the-server’ is listed, they lookup it’s ‘A’ record and connect. A message to [email protected] is handled the same way, though when there is no ‘MX’ record it just delivers it to the ‘A’ record for ’the-server.your.org’. If you have both, the ‘MX’ takes precedence.
Installation
Some configuration is done at install time by the package so you must make sure your hostname is correct. We use the hostname ‘mail’ in this example.
# Correct internal hostnames as needed. 'mail' and 'mail.home.lan' are good suggestions.
cat /etc/hostname /etc/hosts
# Set the external host name and run the package installer. If postfix is already installed, apt remove it first
EXTERNAL="mail.your.org"
sudo debconf-set-selections <<< "postfix postfix/mailname string $EXTERNAL"
sudo debconf-set-selections <<< "postfix postfix/main_mailer_type string 'Internet Site'"
sudo apt install --assume-yes postfix
# Add the main domain to the destinations as well
DOMAIN="your.org"
sudo sed -i "s/^mydestination = \(.*\)/mydestination = $DOMAIN, \1/" /etc/postfix/main.cf
sudo systemctl reload postfix.service
Test with telnet - use your unix system ID for the rcpt address below.
telnet localhost 25
ehlo localhost
mail from: <[email protected]>
rcpt to: <[email protected]>
data
Subject: Wish List
Red Ryder BB Gun
.
quit
Assuming that ‘you’ matches your shell account, Postfix will have accepted the message and used it’s Local Delivery Agent to store it in the local message store. That’s in /var/mail
.
cat /var/mail/YOU
Configuration
Encryption
Postfix will use the untrusted “snakeoil” that comes with debian to opportunistically encrypt communication between it and other mail servers. Surprisingly, most other servers will accept this cert (or fall back to non-encrypted), so lets proceed for now. We’ll generate a trusted one later.
Spam Protection
The default config is secured so that it won’t relay messages, but it will accept message from Santa, and is subject to backscatter and a few other things. Let’s tighten it up.
sudo tee -a /etc/postfix/main.cf << EOF
# Tighten up formatting
smtpd_helo_required = yes
disable_vrfy_command = yes
strict_rfc821_envelopes = yes
# Error codes instead of bounces
invalid_hostname_reject_code = 554
multi_recipient_bounce_reject_code = 554
non_fqdn_reject_code = 554
relay_domains_reject_code = 554
unknown_address_reject_code = 554
unknown_client_reject_code = 554
unknown_hostname_reject_code = 554
unknown_local_recipient_reject_code = 554
unknown_relay_recipient_reject_code = 554
unknown_virtual_alias_reject_code = 554
unknown_virtual_mailbox_reject_code = 554
unverified_recipient_reject_code = 554
unverified_sender_reject_code = 554
EOF
sudo systemctl reload postfix.service
PostFix has some recommendations as well.
sudo tee -a /etc/postfix/main.cf << EOF
# PostFix Suggestions
smtpd_helo_restrictions = reject_unknown_helo_hostname
smtpd_sender_restrictions = reject_unknown_sender_domain
smtpd_recipient_restrictions =
permit_mynetworks,
permit_sasl_authenticated,
reject_unauth_destination,
reject_rbl_client zen.spamhaus.org,
reject_rhsbl_reverse_client dbl.spamhaus.org,
reject_rhsbl_helo dbl.spamhaus.org,
reject_rhsbl_sender dbl.spamhaus.org
smtpd_relay_restrictions =
permit_mynetworks,
permit_sasl_authenticated,
reject_unauth_destination
smtpd_data_restrictions = reject_unauth_pipelining
EOF
sudo systemctl reload postfix.service
If you test a message from Santa now, Postfix will do some checks and realize it’s bogus.
550 5.7.27 [email protected]: Sender address rejected: Domain northpole.org does not accept mail (nullMX)
Header Cleanup
Postfix will attach a Received: header to outgoing emails that has details of your internal network and mail client. That’s information you don’t need to broadcast. You can remove that with a “cleanup” step as the message is sent.
# Insert a header check after the 'cleanup' line in the smtp section of the master file and create a header_checks file
sudo sed -i '/^cleanup.*/a\\t-o header_checks=regexp:/etc/postfix/header_checks' /etc/postfix/master.cf
echo "/^Received:/ IGNORE" | sudo tee -a /etc/postfix/header_checks
Note - there is some debate on if this triggers a higher spam score. You may want to replace instead.
Testing
Incoming
You can now receive mail to [email protected]
and [email protected]
. Try this to make sure you’re getting messages. Feel free to install mutt
if you’d like a better client at the console.
Outgoing
You usually can’t send mail and there are several reasons why.
Many ISPs block outgoing port 25 to keep a lid on spam bots. This prevents you from sending any messages. You can test that by trying to connect to gmail on port 25 from your server.
nc -zv gmail-smtp-in.l.google.com 25
Also, many mail servers will reverse-lookup your IP to see who it belongs to. That request will go to your ISP (who owns the IPs) and show their DNS name instead of yours. You’re often blocked at this step, though some providers will work with you if you contact them.
Even if you’re not blocked and your ISP has given you a static IP with a matching reverse-lookup, you will suffer from a lower reputation score as you’re not a well-known email provider. This can cause your sent messages to be delayed while being considered for spam.
To solve these issues, relay your email though a email provider. This will improve your reputation score (used to judge spam), ease the additional security layers such as SPF, DKIM, DMARC, and is usually free at small volume.
Postfix even calls this using a ‘Smarthost’
Next Steps
Now that you can get email, let’s make it so you can also send it.
- Set up a Relay
Troubleshooting
When adding Postfix’s anti-spam suggestions, we left off the smtpd_client_restrictions and smtpd_end_of_data_restrictions as they created problems during testing.
You may get a warning from Postfix that one of the settings you’ve added is overriding one of the earlier settings. Simply delete the first instance. These are usually default settings that we’re overriding.
Use ‘@’ to view the logs from all the related services.
sudo journalctl -u [email protected]
If you change your server’s DNS entry, make sure to update mydestination
in your /etc/postfix/main.cf
and sudo systemctl reload [email protected]
.
Misc
Mail Addresses
Postfix only accepts messages for users in the “local recipient table” which is built from the unix password file and the aliases file1. You can add aliases for other addresses that will deliver to your shell account, but only shell users can receive mail right now. See virtual mailboxes to add users without shell accounts.
In the alias file, you’ll see “Postmaster” (and possibly others) are aliased to root. Add root as an alias to you at the bottom so that mail gets to your mailbox.
echo "root: $USER" | sudo tee -a /etc/aliases
sudo newaliases
1.2.3.2 - Relay
A relay is simply another mail server that you give your outgoing mail to, rather than try to deliver it yourself.
There are many companies that specialize in this. Sign up for a free account and they give you the block of text to add to your postfix config. Some popular ones are:
- SendGrid
- MailGun
- Sendinblue
They allow anywhere between 50 and 300 a day for free.
SendGrid
Relay Setup
SendGrid’s free plan gives you 50 emails a day. Create an account, verify your email address ([email protected]), and follow the instructions. Make sure to sudo apt install libsasl2-modules
https://docs.sendgrid.com/for-developers/sending-email/postfix
Restart Postfix and use mutt to send an email. It works! the only thing you’ll notice is that your message has a “On Behalf Of” notice in the message letting you know it came from SendGrid. Follow the section below to change that.
Domain Integration
To integrate your domain fully, add DNS records for SendGrid using these instructions.
https://docs.sendgrid.com/ui/account-and-settings/how-to-set-up-domain-authentication
This will require you to login and go to:
- Settings -> Sender Authentication -> Domain Authentication
Stick with the defaults that include automatic security and SendGrid will give you three CNAME records. Add those to your DNS and your email will check out.
Technical Notes
DNS
If you’re familiar with email domain-based security, you’ll see that two of the records SendGrid gives you are links to DKIM keys so SendGrid can sign emails as you. The other record (emXXXX) is the host sendgrid will use to send email. The SPF record for that host will include a SendGrid SPF record that includes multiple pools of IPs so that SPF checks will pass. They use CNAMEs on your side so they can rotate keys and pool addresses without changing DNS entries.
If none of this makes sense to you, then that’s really the point. You don’t have to know any of it - they take care of it for you.
Next Steps
Your server can now send email too. All shell users on your sever rejoice!
To actually use your mail server, you’ll want to add some remote client access.
- Set up Dovecot
1.2.3.3 - Dovecot
Dovecot is an IMAP (Internet Message Access Protocol) server that allows remote clients to access their mail. There are other protocols and servers, but Dovecot has about 75% of the internet and is a good choice.
Installation
sudo apt install dovecot-imapd
sudo apt install dovecot-submissiond
Configuration
Storage
Both Postfix and Dovecot use mbox storage format by default. This is one big file with all your mail in it and doesn’t scale well. Switch to the newer maildir format where your messages are stored as individual files.
# Change where Postfix delivers mail.
sudo postconf -e "home_mailbox = Maildir/"
sudo systemctl reload postfix.service
# Change where Dovecot looks for mail.
sudo sed -i 's/^mail_location.*/mail_location = maildir:~\/Maildir/' /etc/dovecot/conf.d/10-mail.conf
sudo systemctl reload dovecot.service
Encryption
Dovecot comes with it’s own default cert. This isn’t trusted, but Thunderbird will prompt you and you can choose to accept it. This will be fine for now. We’ll generate a valid cert later.
Credentials
Dovecot checks passwords against the local unix system by default and no changes are needed.
Submissions
One potential surprise is that IMAP is only for viewing existing mail. To send mail, you use the SMTP protocol and relay messages to your mail server. But we have relaying turned off, as we don’t want just anyone relaying messages.
The solution is to enable authentication and by convention this is done by a separate port process, called the Submission Server.
We’ve installed Dovecot’s submission server as it’s newer and easier to set up. Postfix even suggests considering it, rather than theirs. The only configuration needed it to set the localhost as the relay.
# Set the relay as localhost where postfix runs
sudo sed -i 's/#submission_relay_host =/submission_relay_host = localhost/' /etc/dovecot/conf.d/20-submission.conf
sudo systemctl reload dovecot.service
Port Forwarding
Forward ports 143 and 587 to your mail server and test that you can connect from both inside and outside your LAN.
nc -zf mail.your.org 143
nc -zf mail.your.org 587
If it’s working from outside your network, but not inside, you may need to enable [reflection] aka hairpin NAT. This will be different per firewall vendor, but in OPNSense it’s:
Firewall -> Settings -> Advanced
# Enable these settings
Reflection for port forwards
Reflection for 1:1
Automatic outbound NAT for Reflection
Clients
Thunderbird and others will successfully discover the correct ports and services when you provide your email address of [email protected]
.
Notes
TLS
Dovecot defaults to port 587 for the submission service which is an older standard for explicit TLS. It’s now recommended by RFC to use implicit TLS on port 465 and you can add a new “submissions” service for that, while leaving the default in place. Clients will pick their fav. Thunderbird defaults to the 465 when both are available.
Note: leaving the default sumbission port commented out just means it will use the default port. Comment out the whole block to disable.
vi /etc/dovecot/conf.d/10-master.conf
# Change the default of
service submission-login {
inet_listener submission {
#port = 587
}
}
to
service submission-login {
inet_listener submission {
#port = 587
}
inet_listener submissions {
port = 465
ssl = yes
}
}
# And reload
sudo systemctl reload dovecot.service
Make sure to port forward 465 at the firewall as well
Next Steps
Now that you’ve got the basics working, let’s secure things a little more
- Set up security
Sources
1.2.3.4 - Security
Certificates
We should use valid certificates. The best way to do that is with the certbot
utility.
Certbot
Certbot automates the process of getting and renewing certs, and only requires a brief connection to port 80 as proof it’s you. There’s also a DNS based approach, but we use the port method for simplicity. It only runs once every 60 days so there is little risk of exploit.
Forward Port 80
You probably already have a web server and can’t just change where port 80 goes. To integrate certbot, add a name-based virtual host proxy to that web server.
# Here is a caddy example. Add this block to your Caddyfile
http://mail.your.org {
reverse_proxy * mail.internal.lan
}
# You can also use a well-known URL if you're already using that vhost
http://mail.your.org {
handle /.well-known/acme-challenge/ {
reverse_proxy mail.internal.lan
}
}
Install Certbot
Once the port forwarding is in place, you can install certbot and request a certificate. Note the --deploy-hook
argument. This reloads services after a cert is obtained or renewed. Else, they’ll keep using an expired one.
DOMAIN=your.org
sudo apt install certbot
sudo certbot certonly --standalone --domains mail.$DOMAIN --non-interactive --agree-tos -m postmaster@$DOMAIN --deploy-hook "service postfix reload; service dovecot reload"
Postfix
Tell Postfix about the cert by using the postconf utility. This will warn you about any potential configuration errors.
sudo postconf -e 'smtpd_tls_cert_file = /etc/letsencrypt/live/mail.$DOMAIN/fullchain.pem'
sudo postconf -e 'smtpd_tls_key_file = /etc/letsencrypt/live/mail.$DOMAIN/privkey.pem'
sudo postfix reload
Dovecot
Change the Dovecot to use the cert as well.
sudo sed -i 's/^ssl_cert = .*/ssl_cert = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/fullchain.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo sed -i 's/^ssl_key = .*/ssl_key = <\/etc\/letsencrypt\/live\/mail.$DOMAIN\/privkey.pem/' /etc/dovecot/conf.d/10-ssl.conf
sudo dovecot reload
Verifying
You can view the certificates with the commands:
openssl s_client -connect mail.$DOMAIN:143 -starttls imap -servername mail.$DOMAIN
openssl s_client -starttls smtp -showcerts -connect mail.$DOMAIN:587 -servername mail.$DOMAIN
Privacy and Anti-Spam
You can take advantage of Cloudflare (or other) services to accept and inspect your email before forwarding it on to you. As far as the Internet is concerned, Cloudflare is your email server. The rest is private.
Take a look at the Forwarding section, and simply forward your mail to your own server instead of Google’s. That will even allow you to remove your mail server from DNS and drop connections other than CloudFlare if desired.
Intrusion Prevention
In my testing it takes less than an hour before someone discovers and attempts to break into your mail server. You may wish to GeoIP block or otherwise limit connections. You can also use crowdsec.
Crowdsec
Crowdsec is an open-source IPS that monitors your log files and blocks suspicious behavior.
Install as per their instructions.
curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install -y crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables
sudo cscli collections install crowdsecurity/postfix
Postfix
Most services now log to the system journal rather than a file. You can view them with the journalctl
command
# What is the exact service unit name?
sudo systemctl status | grep postfix
# Anything having to do with that service unit
sudo journalctl --unit [email protected]
# Zooming into just the identifiers smtp and smtpd
sudo journalctl --unit [email protected] -t postfix/smtp -t postfix/smtpd
Crowdsec accesses the system journal by adding a block to it’s log acquisition directives.
sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
- "[email protected]"
labels:
type: syslog
---
EOF
sudo systemctl reload crowdsec
Dovecot
Install the dovecot collection as well.
sudo cscli collections install crowdsecurity/dovecot
sudo tee -a /etc/crowdsec/acquis.yaml << EOF
source: journalctl
journalctl_filter:
- "_SYSTEMD_UNIT=dovecot.service"
labels:
type: syslog
---
EOF
sudo systemctl reload crowdsec
Is it working? You won’t see anything at first unless you’re actively under attack. But after 24 hours you may see some examples of attempts to relay spam.
allen@mail:~$ sudo cscli alerts list
╭────┬────────────────────┬────────────────────────────┬─────────┬──────────────────────────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │ value │ reason │ country │ as │ decisions │ created_at │
├────┼────────────────────┼────────────────────────────┼─────────┼──────────────────────────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 60 │ Ip:187.188.233.58 │ crowdsecurity/postfix-spam │ MX │ 17072 TOTAL PLAY TELECOMUNICACIONES SA DE CV │ ban:1 │ 2023-05-24 06:33:10.568681233 +0000 UTC │
│ 54 │ Ip:177.229.147.166 │ crowdsecurity/postfix-spam │ MX │ 13999 Mega Cable, S.A. de C.V. │ ban:1 │ 2023-05-23 20:17:49.912754687 +0000 UTC │
│ 53 │ Ip:177.229.154.70 │ crowdsecurity/postfix-spam │ MX │ 13999 Mega Cable, S.A. de C.V. │ ban:1 │ 2023-05-23 20:15:27.964240044 +0000 UTC │
│ 42 │ Ip:43.156.25.237 │ crowdsecurity/postfix-spam │ SG │ 132203 Tencent Building, Kejizhongyi Avenue │ ban:1 │ 2023-05-23 01:15:43.87577867 +0000 UTC │
│ 12 │ Ip:167.248.133.186 │ crowdsecurity/postfix-spam │ US │ 398722 CENSYS-ARIN-03 │ ban:1 │ 2023-05-20 16:03:15.418409847 +0000 UTC │
╰────┴────────────────────┴────────────────────────────┴─────────┴──────────────────────────────────────────────┴───────────┴─────────────────────────────────────────╯
If you’d like to get into the details, take a look at the Crowdsec page .
Next Steps
Now that your server is secure, let’s take a look at how email is authenticated and how to ensure yours is.
1.2.3.5 - Authentication
Email authentication prevents forgery. People can still send unsolicited email, but they can’t fake who it’s from. If you set up a Relay for Postfix, the relayer is doing it for you. But otherwise, proceed onward to prevent your outgoing mail being flagged as spam.
You need three things
- SPF: Server IP addresses - which specific servers have authorization to send email.
- DKIM: Server Secrets - email is signed so you know it’s authentic and unchanged.
- DMARC: Verifies the address in the From: aligns with the domain sending the email, and what to do if not.
SPF
SPF, or Sender Policy Framework, is the oldest component. It’s a DNS TXT record that lists the servers authorized to send email for a domain.
A receiving server looks at a messages’s return path (aka RFC5321.MailFrom header) to see what domain the email purports to be from. It then looks up that domain’s SPF record and if the server that sent the email isn’t included, the email is considered forged.
Note - this doesn’t check the From: header the user sees. Messages can appear (to the user) to be from anywhere. So it’s is mostly a low-level check to prevent spambots.
The DNS record for your Postfix server should look like:
Type: "TXT"
NAME: "@"
Value: "v=spf1 a:mail.your.org -all"
The value above shows the list of authorized servers (a:) contains mail.your.org. Mail from all other servers is considered forged (-all).
To have your Postfix server check SPF for incoming messages add the SPF policy agent.
sudo apt install postfix-policyd-spf-python
sudo tee -a /etc/postfix/master.cf << EOF
policyd-spf unix - n n - 0 spawn
user=policyd-spf argv=/usr/bin/policyd-spf
EOF
sudo tee -a /etc/postfix/main.cf << EOF
policyd-spf_time_limit = 3600
smtpd_recipient_restrictions =
permit_mynetworks,
permit_sasl_authenticated,
reject_unauth_destination,
check_policy_service unix:private/policyd-spf
EOF
sudo systemctl restart postfix
DKIM
DKIM, or DomainKeys Identified Mail, signs the emails as they are sent ensuring that the email body and From: header (the one you see in your client) hasn’t been changed in transit and is vouched for by the signer.
Receiving servers see the DKIM header that includes who signed it, then use DNS to check it. Unsigned mail simply isn’t checked. (There is no could-but-didn’t in the standard).
Note - There is no connection between the domain that signs the message and what the user sees in the From: header. Messages can have a valid DKIM signature and still appear to be from anywhere. DKIM is mostly to prevent man-in-the-middle attacks from altering the message.
For Postfix, this requires installation of OpenDKIM and a connection as detailed here. Make sure to sign with the domain root.
https://tecadmin.net/setup-dkim-with-postfix-on-ubuntu-debian/
Once you’ve done that, create the following DNS entry.
Type: "TXT"
NAME: "default._domainkey"
Value: "v=DKIM1; h=sha256; k=rsa; p=MIIBIjANBgkq..."
DMARC
Having a DMARC record is the final piece that instructs servers to check the From: header the user sees against the domain return path from the SPF and DKIM checks, and what to do on a fail.
This means mail “From: [email protected]” sent though mail.your.org mail servers will be flagged as spam.
The DNS record should look like:
Type: "TXT"
NAME: "_dmarc"
Value: "v=DMARC1; p=reject; adkim=s; aspf=r;"
- p=reject: Reject messages that fail
- adkim=s: Use strict DKIM alignment
- aspf=r: Use relaxed SPF alignment
Reject (p=reject) indicates that email servers should “reject” emails that fail DKIM or SPF tests, and skip quarantine.
Strict DKIM alignment (=s) means that the SPF Return-Path domain or the DKIM signing domain must be an exact match with the domain in the From: address. A DKIM signature from your.org would exactly match [email protected].
Relaxed SPF alignment (=r) means subdomains of the From: address are acceptable. I.e. the server mail.your.org from the SPF test aligns with an email from: [email protected].
You can also choose quarantine mode (p=quarantine) or report-only mode (p=none) where the email will be accepted and handled as such by the receiving server, and a report sent to you like below.
v=DMARC1; p=none; rua=mailto:[email protected]
DMARC is an or test. In the first example, if either the SPF or DKIM domains pass, then DMARC passes. You can choose to test one, both or none at all (meaning nothing can pass DMARC) as the the second DMARC example.
To implement DMARC checking in Postfix, you can install OpenDMARC and configure a mail filter as described below.
https://www.linuxbabe.com/mail-server/opendmarc-postfix-ubuntu
Next Steps
Now that you are hadnling email securely and authentically, let’s help ease client connections
1.2.3.6 - Autodiscovery
In most cases you don’t need this. Thunderbird, for example, will use a shotgun approach and may find your sever using ‘common’ server names based on your email address.
But there is an RFC and other clients may need help.
DNS SRV
This takes advantage of the RFC with an entry for IMAP and SMTP Submission
Type | Name | Service | Protocol | TTL | Priority | Weight | Port | Target |
---|---|---|---|---|---|---|---|---|
SRV | @ | _imap | TCP | auto | 10 | 5 | 143 | mail.your.org |
SRV | @ | _submission | TCP | auto | 10 | 5 | 465 | mail.your.org |
Web Autoconfig
- Create a DNS entry for autoconfig.your.org
- Create a vhost and web root for that with the file
mail/config-v1.1.xml
- Add the contents below to that file
<?xml version="1.0"?>
<clientConfig version="1.1">
<emailProvider id="your.org">
<domain>your.org</domain>
<displayName>Example Mail</displayName>
<displayShortName>Example</displayShortName>
<incomingServer type="imap">
<hostname>mail.your.org</hostname>
<port>143</port>
<socketType>STARTTLS</socketType>
<username>%EMAILLOCALPART%</username>
<authentication>password-cleartext</authentication>
</incomingServer>
<outgoingServer type="smtp">
<hostname>mail.your.org</hostname>
<port>587</port>
<socketType>STARTTLS</socketType>
<username>%EMAILLOCALPART%</username>
<authentication>password-cleartext</authentication>
<addThisServer>true</addThisServer>
</outgoingServer>
</emailProvider>
<clientConfigUpdate url="https://www.your.org/config/mozilla.xml" />
</clientConfig>
Note
It’s traditional to match server names to protocols and we would have used “imap.your.org” and “smtp.your.org”. But using ‘mail’ is popular now and it simplifies setup at several levels.
Thunderbird will try to guess at your server names, attempting to connect to smtp.your.org for example. But many Postfix configurations have spam prevention that interfere.
Sources
https://cweiske.de/tagebuch/claws-mail-autoconfig.htm
https://www.hardill.me.uk/wordpress/2021/01/24/email-autoconfiguration/
1.3 - Web
1.3.1 - Content Mgmt
There are many ways to manage and produce web content. Traditionally, you’d use a large application with roles and permissions.
A more modern approach is to use a distributed version control system, like git, and a site generator.
Static Site Generators are gaining popularity as they produce static HTML with javascript and CSS that can be deployed to any Content Delivery Network without need for server-side processing.
Astro is great, as is Hugo, with the latter being around longer and having more resources.
1.3.1.1 - Hugo
Hugo is a Static Site Generator (SSG) that turns Markdown files into static web pages that can be deployed anywhere.
Like WordPress, you apply a ’theme’ to style your content. But rather than use a web-inteface to create content, you directly edit the content in markdown files. This lends itself well tomanaging the content as code and appeals to those who prefer editing text.
However, unlike other SSGs, you don’t have to be a front-end developer to get great results and you can jump in with a minimal investment of time.
1.3.1.1.1 - Hugo Install
Requirements
I use Debian in this example, but any apt-based distro will be similar.
Preparation
Enable and pin the Debian Backports and Testing repos so you can get recent versions of Hugo and needed tools.
Installation
Hugo requires git
and go
# Assuming you have enable backports as per above
sudo apt install -t bullseye-backports git
sudo apt install -t bullseye-backports golang-go
For a recent version of Hugo you’ll need to go to the testing repo. The extended version is recommended by Hugo and it’s chosen by default.
# This pulls in a number of other required packages, so take a close look at the messages for any conflicts. It's normally fine, though.
sudo apt install -t testing hugo
In some cases, you can just install from the Debian package with a lot less effort. Take a look at latest and copy the URL into a wget.
https://github.com/gohugoio/hugo/releases/latest
wget https://github.com/gohugoio/hugo/releases/download/v0.124.1/hugo_extended_0.124.1_linux-amd64.deb
Configuration
A quick test right from the quickstart page to make sure everything works
hugo new site quickstart
cd quickstart
git init
git submodule add https://github.com/theNewDynamic/gohugo-theme-ananke themes/ananke
echo "theme = 'ananke'" >> config.toml
hugo server
Open up a browser to http://localhost:1313/
and you you’ll see the default ananke-themed site.
Next Steps
The ananke theme you just deployed is nice, but a much better theme is Docsy. Go give that a try.
Links
1.3.1.1.2 - Docsy Install
Docsy is a good-looking Hugo theme that provides a landing page, blog, and a documentation sub-sites using bootstrap CSS.
The documentation site in particular let’s you turn a directory of text files into a documentation tree with relative ease. It even has a collapsible left nav bar. That is harder to find than you’d think.
Preparation
Docsy requires Hugo. Install that if you haven’t already. It also needs a few other things; postcss, postcss-cli, and autoprefixer from the Node.JS ecosystem. These should be installed in the project directory as version requirements change per theme.
mkdir some.site.org
cd some.site.org
sudo apt install -t testing nodejs npm
npm install -D autoprefixer
npm install -D postcss
npm install -D postcss-cli
Installation
Deploy Docsy as a Hugo module and pull in the example site so we have a skeleton to work with. We’re using git, but we’ll keep it local for now.
git clone https://github.com/google/docsy-example.git .
hugo server
Browse to http://localhost:1313 and you should see the demo “Goldydocs” site.
Now you can proceed to configure Docsy!
Updating
The Docsy theme gets regular updates. To incorporate those you only have to run this command. Do this now, actually, to get any theme updates the example site hasn’t incoporated yet.
cd /path/to/my-existing-site
hugo mod get -u github.com/google/docsy
Troubleshooting
hugo
Error: Error building site: POSTCSS: failed to transform “scss/main.css” (text/css)>: Error: Loading PostCSS Plugin failed: Cannot find module ‘autoprefixer’
And then when you try to install the missing module
The following packages have unmet dependencies: nodejs : Conflicts: npm npm : Depends: node-cacache but it is not going to be installed
You may have already have installed Node.JS. Skip trying to install it from the OS’s repo and see if npm
works. Then proceed with postcss install and such.
1.3.1.1.3 - Docsy Config
Let’s change the basics of the site in the config.toml
file. I put some quick sed commands here, but you can edit by hand as well. Of note is the Github integration. We prepoulate it here for future use, as it allows quick edits in your browser down the road.
SITE=some.site.org
GITHUBID=someUserID
sed -i "s/Goldydocs/$SITE/" config.toml
sed -i "s/The Docsy Authors/$SITE/" config.toml
sed -i "s/example.com/$SITE/" config.toml
sed -i "s/example.org/$SITE/" config.toml
sed -i "s/google\/docsy-example/$GITHUBID\/$SITE/" config.toml
sed -i "s/USERNAME\/REPOSITORY/$GITHUBID\/$SITE/" config.toml
sed -i "s/https:\/\/policies.google.com//" config.toml
sed -i "s/https:\/\/github.com\/google\/docsy/https:\/\/github.com\/$GITHUBID/" config.toml
sed -i "s/github_branch/#github_branch/" config.toml
If you don’t plan to translate your site into different languages, you can dispense with some of the extra languages as well.
# Delete the 20 or so lines starting at "lLanguage] and stopping at the "[markup]" section,
# including the english section.
vi config.tml
# Delete the folders from 'content/' as well, leaving 'en'
rm -rf content/fa content/no
You should also set a default meta description or the engine will put in in the bootstrap default and google will summarize all your pages with that
vi config.toml
[params]
copyright = "some.site.org"
privacy_policy = "/privacy"
description = "My personal website to document what I know and how I did it"
Keep and eye on the site in your browser as you make changes. When you’re ready to start with the main part of adding content, take a look at the next section.
Notes
You can’t dispense with the en folder yet, as it breaks some github linking functionality you may want to take advantage of later
1.3.1.1.4 - Docsy Operate
This is a quick excerpt from the Docsy Content and Customization docs. Definitely spend time with those after reading the overview here.
Directory Layout
Content is, appropriately enough, in the content
directory, and it’s subdirectories line up with the top-level navigation bar of the web site. About, Documentation, etc corresponds to content/about
, content/docs
and so on.
The directories and files you create will be the URL that you get with one important exception, filenames are converted to a ‘slug’, mimicking how index files work. For example, If you create the file docs/tech/mastadon.md
the URL will be /docs/tech/mastadon/
. This is for SEO (Search Engine Optimization).
The other thing you’ll see are _index.html
files. In the example above, the URL /docs/tech/
has no content, as it’s a folder. But you can add a _index.md
or .html to give it some. Avoid creating index.md
or tech.md
(a file that matches the name of a subdirectory). Either of those will block Hugo from generating content for any subdirectories.
The Landing Page and Top Nav Pages
The landing page itself is the content/_index.html
file and the background is featured-background.jpg. The other top-nav pages are in the content folders with _index
files. You may notice the special header variable “menu: main: weight: " and that is what flags that specific page as worth of being in the top menu. Removing that, or adding that (and a linkTitle) will change the top nav.
The Documentation Page and Left Nav Bar
One of the most important features of the Docsy template is the well designed documentation section that features a Section menu, or left nav bar. This menu is built automatically from the files you put in the docs
folder, as long as you give them a title. (See Front Matter, below). They are ordered by date but you can add a weight to change that.
It doesn’t collapse by default and if you have a lot of files, you’ll want to enable that.
# Search and set in your config.toml
sidebar_menu_compact = true
Front Matter
The example files have a section at the top like this. It’s not strictly required, but you must have at least the title or they won’t show up in the left nav tree.
---
title: "Examples"
---
Page Content and Short Codes
In addition to normal markdown or html, you’ll see frequent use of ‘shortcodes’ that do things that normal markdown can’t. These are built in to Hugo and can be added by themes, and look like this;
{{% blocks/lead color="dark" %}}
Some Important Text
{{% /blocks/lead %}}
Diagrams
Docsy supports mermaid and a few other tools for creating illustrations from code, such as KaTeX, Mermaid, Diagrams.net, PlantUML, and MarkMap. Simply use a codeblock.
```mermaid
graph LR
one --> two
```
Generate the Website
Once you’re satisfied with what you’ve got, tell hugo to generate the static files and it will populate the folder we configured earlier
hugo
Publish the Web Site
Everything you need is in the public
folder and all you need do is copy it to a web server. You can even use git, which I advise since we’re already using it to pull in and update the module.
Bonus Points
If you have a large directory structure full of markdown files already, you can kick-start the process of adding frontmatter like this;
find . -type f | \
while read X
do
TITLE=$(basename ${X%.*})
FRONTMATTER=$(printf -- "---\ntitle = ${TITLE}\n---")
sed -i "1s/^/$FRONTMATTER\n/" "$X"
done
1.3.1.1.5 - Docsy Github
You may have noticed the links on the right like “Edit this page” that takes one to Github. Let’s set those up.
On Github
Go to github and create a new repository. Use the name of your side for the repo name, such as “some.site.org”. If you want to use something else, you can edit your config.toml
file to adjust.
Locally
You man have noticed that Github suggested some next steps with a remote add
using the name “origin”. Docsy is already using that, however, from when you cloned it. So we’ll have to pick a new name.
cd /path/to/my-existing-site
git remote add github https://github.com/yourID/some.site.org
Let’s change our default banch to “main” to match Github’s defaults.
git branch -m main
Now we can add, commit and push it up to Github
git add --all
git commit -m "first commit of new site"
git push github
You’ll notice something interesting when you go back to look at Github; all the contributers on the right. That’s because you’re dealing with a clone of Docsy and you can still pull in updates and changes from original project.
It may have been better to clone it via github
1.3.2 - Content Deployment
Automating deployment as part of a general continuous integration strategy is best-practice these days. Web content should be similarly treated.
I.e. version controlled and deployed with git.
1.3.2.1 - Local Git Deployment
Overview
Let’s create a two-tiered system that goes from dev to prod using a post-commit trigger
graph LR Development --git / rsync---> Production
The Development system is your workstation. git commit will trigger a build and rsync.
The Production system is a web server. Any web server will do as long as you have SSH access and can update a web-root folder.
I use Hugo in this example, but any system that has an output (or build) folder works similarly.
Configuration
The first thing we need to know is where wee are going, so lets prepare production first.
Production System
This server probably uses folders like /var/www/XXXXX
for its web root. Use that or create a new folder and make yourself the owner.
sudo mkdir /var/www/some.site.org
sudo chown -R $USER /var/www/some.site.org
echo "Hello" > /var/www/some.site.org/index.html
Edit your web server’s config to make sure you can view that web page. Also check that rsync
is available from the command line.
Development System
Hugo builds static html in a public
directory. To generate the HTML, simply type hugo
cd /path/to/my-existing-site
hugo
ls public
We don’t actually want this folder in git and most themes (if you’re using Hugo) already exclude it. Look for a .gitignore
file to and create/add if needed.
# Notice /public is at the top of the git ignore file
cat .gitignore
/public
package-lock.json
.hugo_build.lock
...
Assuming you have some content, let’s add and commit it.
git add --all
git commit -m "Initial Commit"
Note: All of these git commands work because pulling in a theme initialized the directory. If you’re doing something else you’ll need to git init
.
The last step is to create a hook that will build and deploy after a commit.
cd /path/to/my-existing-site
touch .git/hooks/post-commit
chmod +x .git/hooks/post-commit
vi .git/hooks/post-commit
#!/bin/sh
hugo --cleanDestinationDir
rsync --recursive --delete public/ [email protected]:/var/www/some.site.org
This script ensures that the remote directory matches your local directory. When you’re ready to update the remote site:
git add --all
git commit --allow-empty -m "trigger update"
If you mess up the production files, you can just call the hook manually.
cd /path/to/my-existing-site
touch .git/hooks/post-commit
Troubleshooting
bash: line 1: rsync: command not found
Double check that the remote host has rsync.
1.3.3 - Content Delivery
1.3.3.1 - Cloudflare
- Cloudflare acts as a reverse proxy to hide your server’s IP address
- Takes over your DNS and directs requests to the closest site
- Injects JavaScript analytics
- If the browser’s “do not track” is on, JS isn’t injected.
- Can uses a tunnel and remove encryption overhead
1.3.4 - Servers
1.3.4.1 - Caddy
Caddy is a web server that runs SSL by default by automatically grabing a cert from Let’s Encrypt. It comes as a stand-alone binary, written in Go, and makes a decent reverse proxy.
1.3.4.1.1 - Installation
Installation
Caddy recommends “using our official package for your distro” and for debian flavors they include the basic instructions you’d expect.
Configuration
The easiest way to configure Caddy is by editing the Caddyfile
sudo vi /etc/caddy/Caddyfile
sudo systemctl reload caddy.service
Sites
You define websites with a block that includes a root
and the file_server
directive. Once you reload, and assuming you already have the DNS in place, Caddy will reach out to Let’s Encrypt, acquire a certificate, and automatically forward from port 80 to 443
site.your.org {
root * /var/www/site.your.org
file_server
}
Authentication
You can add basicauth to a site by creating a hash and adding a directive to the site.
caddy hash-password
site.your.org {
root * /var/www/site.your.org
file_server
basicauth {
allen SomeBigLongStringFromTheCaddyHashPasswordCommand
}
}
Reverse Proxy
Caddy also makes a decent reverse proxy.
site.your.org {
reverse_proxy * http://some.server.lan:8080
}
You can also take advantage of path-based reverse proxy. Note the rewrite to accommodate the trailing-slash potentially missing.
site.your.org {
rewrite /audiobooks /audiobooks/
handle_path /audiobooks/* {
uri strip_prefix /audiobooks/
reverse_proxy * http://some.server.lan:8080
}
}
Include Blocks
You can define common elements at the top and include them on multiple sites. This helps when you have many sites.
(logging) {
log {
output file /var/log/caddy/access.log
}
}
site.your.org {
import logging
reverse_proxy * http://some.server.lan:8080
}
Modules
Caddy is a single binary so when adding a new module (aka feature) you are essentially downloading a new version that has them compiled in. You can find the list of packages at their download page.
Do this at the command line with caddy itself.
sudo caddy add-package github.com/mholt/caddy-webdav
systemctl restart caddy
Security
Drop Unknown Domains
Caddy will accept connections to port 80, announce that it’s a Caddy web server and redirect you to https before realizing it doesn’t have a site or cert for you. Configure this directive at the bottom so it drops immediately.
http:// {
abort
}
Crowdsec
Caddy runs as it’s own user and is fairly memory-safe. But installing Crowdsec helps identify some types of intrusion attempts.
[TODO]
Troubleshooting
You can test your config file and look at the logs like so
caddy validate --config /etc/caddy/Caddyfile
journalctl --no-pager -u caddy
1.3.4.1.2 - WebDAV
Caddy can also serve WebDAV requests with the appropriate module. This is important because for many clients, such as Kodi, WebDAV is significantly faster.
sudo caddy add-package github.com/mholt/caddy-webdav
sudo systemctl restart caddy
{ # Custom modules require order of precedence be defined
order webdav last
}
site.your.org {
root * /var/www/site.your.org
webdav *
}
You can combine WebDAV and Directly Listing - highly recommended - so you can browse the directory contents with a normal web browser as well. Since WebDAV doesn’t use the GET method, you can use the @get
filter to route those to the file_server module so it can serve up indexes via the browse
argument.
site.your.org {
@get method GET
root * /var/www/site.your.org
webdav *
file_server @get browse
}
Sources
https://github.com/mholt/caddy-webdav https://marko.euptera.com/posts/caddy-webdav.html
1.3.4.1.3 - MFA
The package caddy-security offers a suite of auth functions. Among these is MFA and a portal for end-user management of tokens.
Installation
# Install a version of caddy with the security module
sudo caddy add-package github.com/greenpau/caddy-security
sudo systemctl restart caddy
Configuration
/var/lib/caddy/.local/caddy/users.json
caddy hash-password
Troubleshooting
journalctl –no-pager -u caddy
1.3.4.1.4 - Wildcard DNS
Caddy has an individual cert for every virtual host you create. This is fine, but Let’s Encrypt publishes these as part of certificate transparency and the bad guys are watching. If you create a new site in caddy, you’ll see bots probing for weaknesses within 30 min - without you even having published the URL. There’s no security in anonymity, but the need-to-know principle suggests we shouldn’t be informing the whole world about sites of limited scope.
One solution is a wildcard cert. It’s published as just ‘*.some.org’ so there’s no information disclosed. Caddy supports this, but it requires a little extra work.
Installation
In this example we have already installed caddy and use cloudflare as a hosted DNS provider. Check https://github.com/caddy-dns to see if your DNS provider is available.
# Divert the default binary from the repo
sudo dpkg-divert --divert /usr/bin/caddy.default --rename /usr/bin/caddy
sudo cp /usr/bin/caddy.default /usr/bin/caddy.custom
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.default 10
sudo update-alternatives --install /usr/bin/caddy caddy /usr/bin/caddy.custom 50
# Add the package and restart.
sudo caddy add-package github.com/caddy-dns/cloudflare
sudo systemctl restart caddy.service
From here on out, apt update
will not upgrade caddy. You must enter caddy upgrade
at the command line. The devs don’t think this should be an issue.
DNS Provider Configuration
For Cloudflare, a decent example is below. Just use the ‘Getting the Cloudflare API Token’ part
https://roelofjanelsinga.com/articles/using-caddy-ssl-with-cloudflare/
Caddy Configuration
Use the acme-dns
global option and then create a single site (used to determine the cert) and match the actual vhosts with subsites.
{
acme_dns cloudflare alotcharactersandnumbershere
}
*.some.org, some.org {
@site1 host site1.some.org
handle @site1 {
reverse_proxy * http://localhost:3200
}
@site2 host site2.some.org
handle @site2 {
root * /srv/www/site2
}
}
2 - Media
2.1 - Players
2.1.1 - LibreELEC
One of the best systems for a handling media is LibreELEC. It’s both a Kodi box and a server appliance that’s resistant to abuse. With the right hardware (like a ROCKPro64 or Waveshare) it also makes an excellent portable server for traveling.
Deployment
Download an image from https://libreelec.tv/downloads and flash as directed. Enable SSH during the initial setup.
Storage
RAID is a useful feature but only BTRFS works directly. This is fine, but with a little extra work you can add MergerFS, a popular option for combining disks.
BTRFS
Create the RAID set on another PC. If your disks are of different sizes you can use the ‘single’ profile, but leave the metadata mirrored.
sudo mkfs.btrfs -f -L pool -d single -m raid1 /dev/sda /dev/sdb /dev/etc...
After attaching to LibreELEC, the array will be automatically mounted at /media/pool
based on label pool
you specified above.
MergerFS
This is a good option if you just want to combine disks and unlike most other RAID technologies, if you loose a disk the rest will keep going. Many people combine this with SnapRAID for off-line parity.
But it’s a bit more work.
Cooling
You may want to manage the fan. The RockPro64 has a PWM fan header and LibreELEC loads the pwm_fan module.
Kodi Manual Start
The kodi process can use a significant amount of CPU even at rest. If you’re using this primarily as a file server you can disable kodi from starting automatically.
cp /usr/lib/systemd/system/kodi.service /storage/.config/system.d/kodi-alt.service
systemctl mask kodi
To start kodi, you can enter systemctl start kodi-alt
Remotes
Plug in a cheap Fm4 style remote and it ‘just works’ with kodi. But if you want to customize some remote buttons, say to start kodi manually, you still can.
Enable SMB
To share your media, simply copy the sample file, remove all the preconfigured shares (unless you want them), and add one for your storage pool. Then just enable Samba and reboot (so the file is picked up)
cp /storage/.config/samba.conf.sample /storage/.config/samba.conf
vi /storage/.config/samba.conf
[media]
path = /storage/pool
available = yes
browseable = yes
public = yes
writeable = yes
Config --> LibreELEC --> Services --> Enable Samba
Enable HotSpot
Config --> LibreELEC --> Network --> Wireless Networks
Enable Active and Wireless Access Point and it just works!
Enable Docker
This is a good way handle things like Jellyfin or Plex if you must. In the GUI, go to add-ons, search for the items below and install.
- docker
- LinuxServer.io
- Docker Image Updater
Then you must make sure the docker starts starts after the storage is up or the containers will see an empty folder instead of a mounted one.
vi /storage/.config/system.d/service.system.docker.service
[Unit]
...
...
After=network.target storage-pool.mount
If that fails, you can also tell docker to wait a bit
ExecStartPre=/usr/bin/sleep 120
Remote Management
You may be called upon to look at something remotely. Sadly, there’s no remote access to the GUI but you can use things like autossh
to create a persistent remote tunnel, or wireguard
to create a VPN connection. Wireguard is usually better.
2.1.1.1 - Add-ons
You can also use this platform as a server. This seems counter-intuitive at first; to use a media player OS as a server. But in practice it is rock-solid. I have a mixed fleet of 10 or so devices and LibreELEC has better uptime stats than TrueNAS.
The device playing content on your TV is also the media server for the rest of the house. I wouldn’t advertise this as an enterprise solution, but I can’t dispute the results.
Installation
Normal Add-ons
Common tools like rsync, as well as server software like Jellyfin are available. You can browse as descriped below, or use the search tool if you’re looking for something specific.
- Select the gear icon and choose Add-ons
- Choose LibreELEC Add-ons
- Drill down to browse software.
Docker
If you’re on ARM or want more frequent updates, you may want to add Docker and the LinuxServer.io repository.
- Select the gear icon and choose Add-ons
- Search add-ons for “Docker” and install
- Search add-ons for “LinuxServer.io” and install
- Select “Install from repository” and choose “LinuxServer.io’s Docker Add-ons”.
Drill down and add Jellyfin, for example.
2.1.1.2 - AutoSSH
This allows you to setup and monitor a remote tunnel as the easiest wat to manage remote clients is to let them come to you. To accomplish this, we’ll set up a server, create client keys, test a reverse tunnel, and setup autossh.
The Server
This is simply a server somewhere that everyone can reach via SSH. Create a normal user account with a password and home directory, such as with adduser remote
. We will be connecting from our clients for initial setup with this.
The Client
Use SSH to connect to the LibreELEC client, generate a ssh key pair and copy it to the remote server
ssh [email protected]
ssh-keygen -f ~/.ssh/id_rsa -q -P ""
# ssh-copy-id isn't available so you must use the rather harder command below
cat ~/.ssh/id_rsa.pub | ssh -t [email protected] "cat - >> ~/.ssh/authorized_keys"
ssh [email protected]
If all went well you can back out and then test logging in with no password. Make sure to do this and accept the key so th
The Reverse Tunnel
SSH normally connects your terminal to a remote server. Think of this as a encrypted tunnel where your keystrokes are sent to the server and it’s responses are sent back to you. You can send more than your keystrokes, however. You can take any port on your system and send it as well In our case, we’ll take port 22 (where ssh just happens to be listening) and send it to the rendezvous server on port 2222. SSH will continue to accept local connections while also taking connections from the remote port we are tunneling in.
# On the client, issue this command to connect the (-R)remote port 2222 to localhost:22, i.e. the ssh server on the client
ssh -N -R 2222:localhost:22 -o ServerAliveInterval=240 -o ServerAliveCountMax=2 [email protected]
# Leave that running while you login to the rendezvois server and test if you can now ssh to the client by connecting to the forwarded port.
ssh [email protected]
ssh root@localhost -p 2222
# Now exit both and set up Autossh below
Autossh
Autossh is a daemon that monitors ssh sessions to make sure they’re up and operational, restarting them as needed, and this is exactly what we need to make sure the ssh session from the client stays up. To run this as a service, a systemd service file is needed. For LibreELEC, these are in /storage/.config.
vi /storage/.config/system.d/autossh.service
[Unit]
Description=autossh
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=root
EnvironmentFile=/storage/.config/autossh
ExecStart=/storage/.kodi/addons/virtual.system-tools/bin/autossh $SSH_OPTIONS
Restart=always
RestartSec=60
[Install]
WantedBy=multi-user.target
vi /storage/.config/autossh
AUTOSSH_POLL=60
AUTOSSH_FIRST_POLL=30
AUTOSSH_GATETIME=0
AUTOSSH_PORT=22034
SSH_OPTIONS="-N -R 2222:localhost:22 [email protected] -i /storage/.ssh/id_rsa"
systemctl enable autossh.service
systemctl start autossh.service
systemctl status autossh.service
At this point, the client has a SSH connection to your server on port 22, opened port 2222 the ssh server and forwarded that back to it’s own ssh server. You can now connect by:
ssh [email protected]
ssh root@localhost -p 2222
If not, check the logs for errors and try again.
journalctl -b 0 --no-pager | less
Remote Control
Now that you have the client connected, you can use your Rendezvous Server as a Jump Host to access things on the remote client such as it’s web interface and even the console via VNC. Your connection will generally take the form of:
ssh localport:libreelec:libreelec_port -J rendezvoisServer redevoisServer -p autosshPort
The actual command is hard to read as are going through the rendezvois server twice and connecting to localhost on the destination.
ssh -L 8080:localhost:32400 -J [email protected] root@localhost -p 2222
2.1.1.3 - Building
This works best in an Ubuntu container.
LibreELECT Notes
Installed but no sata hdd. Found this
RPi4 has zero support for PCIe devices so why is it “embarrasing” for LE to omit support for PCIe SATA things in our RPi4 image?
Feel free to send a pull-request to GitHub enabling the kernel config that’s needed.
https://forum.libreelec.tv/thread/27849-sata-controller-error/
Went though thier resouces beginners guid to git https://wiki.libreelec.tv/development/git-tutorial#forking-and-cloning building basics https://wiki.libreelec.tv/development/build-basics specific build commands https://wiki.libreelec.tv/development/build-commands/build-commands-le-12.0.x
and then failed because jammy wasn’t compatibile enough
Created a jammy container and restarted
https://ubuntu.com/server/docs/lxc-containers
sudo lxc-create –template download –name u1 ubuntu jammy amd64 sudo lxc-start –name u1 –daemon sudo lxc-attach u1
Used some of the notes from
https://www.artembutusov.com/libreelec-raid-support/
Did as fork, clone and a
git fetch –all
but couldnt get all the downloads as alsa.org site was down
On a side note, these are needed in the config.txt so that USB works
otg_mode=1,dtoverlay=dwc2,dr_mode=host
I tried a menuconfig and selected ..sata? and got
CONFIG_ATA=m < CONFIG_ATA_VERBOSE_ERROR=y < CONFIG_ATA_FORCE=y CONFIG_ATA_SFF=y CONFIG_ATA_BMDMA=y
Better compare the .config file again
Edited and commited a config.txt but it didn’t show up in the image. Possibly the wrong file or theres another way to realize that chagne
Enabled the SPI interface
https://raspberrypi.stackexchange.com/questions/48228/how-to-enable-spi-on-raspberry-pi-3 https://wiki.libreelec.tv/configuration/config_txt
sudo apt install lxc
# This didn't work for some reason
sudo lxc-create --template download --name u1 --dist ubuntu --release jammy --arch amd64
sudo lxc-create --template download --name u1
sudo lxc-start --name u1 --daemon
sudo lxc-attach u1
# Now inside, build
apt update
apt upgrade
apt-get install gcc make git wget
apt-get install bc patchutils bzip2 gawk gperf zip unzip lzop g++ default-jre u-boot-tools texinfo xfonts-utils xsltproc libncurses5-dev xz-utils
# login and fork so you can clone more easily. Some problem with the creds
cd
git clone https://github.com/agattis/LibreELEC.tv
cd LibreELEC.tv/
git fetch --all
git tag
git remote add upstream https://github.com/LibreELEC/LibreELEC.tv.git
git fetch --all
git checkout libreelec-12.0
git checkout -b CM4-AHCI-Add
PROJECT=RPi ARCH=aarch64 DEVICE=RPi4 tools/download-tool
ls
cat /etc/passwd
pwd
ls /home/
ls /home/ubuntu/
ls
cd ..
mv LibreELEC.tv/ /home/ubuntu/
cd /home/ubuntu/
ls -lah
chown -R ubuntu:ubuntu LibreELEC.tv/
ls -lah
cd LibreELEC.tv/
ls
ls -lah
cd
sudo -i -u ubuntu
ip a
cat /etc/resolv.conf
ip route
sudo -i -u ubuntu
apt install tmux
sudo -i -u ubuntu tmux a
# And back home you can write
ls -lah ls/u1/rootfs/home/ubuntu/LibreELEC.tv/target/
2.1.1.4 - Fancontrol
Add this to the /storage/bin
and create a service unit.
vi /storage/.config/system.d/fancontrol.service
systemctl enable fancontrol
#!/bin/sh
# Summary
#
# Adjust fan speed by percentage when CPU/GPU is between user set
# Min and Max temperatures.
#
# Notes
#
# Temp can be gleaned from the sysfs termal_zone files and are in
# units millidegrees meaning a reading of 30000 is equal to 30.000 C
#
# Fan speed is read and controlled by he pwm_fan module and can be
# read and set from a sysfs file as well. The value can be set from 0 (off)
# to 255 (max). It defaults to 255 at start
## Set Points
# CPU Temp set points
MIN_TEMP=40 # Min desired CPU temp
MAX_TEMP=60 # Max desired CPU temp
# Fan Speeds set points
FAN_OFF=0 # Fan is off
FAN_MIN=38 # Some fans need a minimum of 15% to start from a dead stop.
FAN_MAX=255 # Max cycle for fan
# Frequency
CYCLE_FREQ=6 # How often should we check, in seconds
SHORT_CYCLE_PERCENT=20 # If we are shutting on or of more than this percent of the
# time, just run at min rather than shutting off
## Sensor and Control files
# CPU and GPU sysfs locations
CPU=/sys/class/thermal/thermal_zone0/temp
GPU=/sys/class/thermal/thermal_zone1/temp
# Fan Control files
FAN2=/sys/devices/platform/pwm-fan/hwmon/hwmon2/pwm1
FAN3=/sys/devices/platform/pwm-fan/hwmon/hwmon3/pwm1
## Logic
# The fan control file isn't available until the module loads and
# is unpredictable in path. Wait until it comes up
FAN=""
while [[ -z $FAN ]];do
[[ -f $FAN2 ]] && FAN=$FAN2
[[ -f $FAN3 ]] && FAN=$FAN3
[[ -z $FAN ]] && sleep 1
done
# The sensors are in millidegrees so adjust the user
# set points to the same units
MIN_TEMP=$(( $MIN_TEMP * 1000 ))
MAX_TEMP=$(( $MAX_TEMP * 1000 ))
# Short cycle detection requires us to track the number
# of on-off flips to cycles
CYCLES=0
FLIPS=0
while true; do
# Set TEMP to the highest GPU/CPU Temp
TEMP=""
read TEMP_CPU < $CPU
read TEMP_GPU < $GPU
[[ $TEMP_CPU -gt $TEMP_GPU ]] && TEMP=$TEMP_CPU || TEMP=$TEMP_GPU
# How many degress above or below our min threshold are we?
DEGREES=$(( $TEMP-$MIN_TEMP ))
# What percent of the range between min and max is that?
RANGE=$(( $MAX_TEMP-$MIN_TEMP ))
PERCENT=$(( (100*$DEGREES/$RANGE) ))
# What number between 0 and 255 is that percent?
FAN_SPEED=$(( (255*$PERCENT)/100 ))
# Override the calculated speed for some special cases
if [[ $FAN_SPEED -le $FAN_OFF ]]; then # Set anything 0 or less to 0
FAN_SPEED=$FAN_OFF
elif [[ $FAN_SPEED -lt $FAN_MIN ]]; then # Set anything below the min to min
FAN_SPEED=$FAN_MIN
elif [[ $FAN_SPEED -ge $FAN_MAX ]]; then # Set anything above the max to max
FAN_SPEED=$FAN_MAX
fi
# Did we just flip on or off?
read -r OLD_FAN_SPEED < $FAN
if ( ( [[ $OLD_FAN_SPEED -eq 0 ]] && [[ $FAN_SPEED -ne 0 ]] ) || \
( [[ $OLD_FAN_SPEED -ne 0 ]] && [[ $FAN_SPEED -eq 0 ]] ) ); then
FLIPS=$((FLIPS+1))
fi
# Every 10 cycles, check to see if we are short-cycling
CYCLES=$((CYCLES+1))
if [[ $CYCLES -ge 10 ]] && [[ ! $SHORT_CYCLING ]]; then
FLIP_PERCENT=$(( 100*$FLIPS/$CYCLES ))
if [[ $FLIP_PERCENT -gt $SHORT_CYCLE_PERCENT ]]; then
SHORT_CYCLING=1
echo "Short-cycling detected. Fan will run at min speed rather than shutting off."
else
CYCLES=0;FLIPS=0
fi
fi
# If we are short-cycling and would turn the fan off, just set to min
if [[ $SHORT_CYCLING ]] && [[ $FAN_SPEED -le $FAN_MIN ]]; then
FAN_SPEED=$FAN_MIN
fi
# Every so often, exit short cycle mode to see if conditions have changed
if [[ $SHORT_CYCLING ]] && [[ $CYCLES -gt 10000 ]]; then # Roughly half a day
echo "Exiting short-cycling"
SHORT_CYCLING=""
fi
# Write that to the fan speed control file
echo $FAN_SPEED > $FAN
# Log the stats everyone once in a while
# if [[ $LOG_CYCLES ]] && [[ $LOG_CYCLES -ge 10 ]]; then
# echo "Temp was $TEMP fan set to $FAN_SPEED"
# LOG_CYCLES=""
# else
# LOG_CYCLES=$(($LOG_CYCLES+1))
# fi
sleep $CYCLE_FREQ
done
# Also look at drive temps. The sysfs filesystem isn't useful for
# all drives on RockPro64 so use smartctl instead
#ls -1 /dev/sd? | xargs -n1 smartctl -A | egrep ^194 | awk '{print $4}'
2.1.1.5 - MergerFS
This is a good option if you just want to combine disks and unlike most other RAID technologies, if you loose a disk the rest will keep going. Many people combine this with SnapRAID for off-line parity.
Prepare and Exempt Disks
Prepare and exempt the file systems from auto-mounting1 so you can supply your own mount options and make sure they are up before you start MergerFS.
Make sure to wipe the disks before using as wipefs and fdisk are not available in LibreELEC.
# Assuming the disks are wiped, format and label each disk the same
mkfs.ext4 /dev/sda
e2label /dev/sda pool-member
# Copy the udev rule for editing
cp /usr/lib/udev/rules.d/95-udevil-mount.rules /storage/.config/udev.rules.d
vi /storage/.config/udev.rules.d/95-udevil-mount.rules
Edit this section by adding the pool-member label from above
# check for special partitions we dont want mount
IMPORT{builtin}="blkid"
ENV{ID_FS_LABEL}=="EFI|BOOT|Recovery|RECOVERY|SETTINGS|boot|root0|share0|pool-member", GOTO="exit"
Test this by rebooting and making sure the drives are not mounted.
Add Systemd Mount Units
Each filesystem requires a mount unit like below. Create one for each drive named disk1, disk2, etc. Note: The name of the file is import and to mount /storage/disk1
the name of the file must be storage-disk1.mount
vi /storage/.config/system.d/storage-disk1.mount
[Unit]
Description=Mount sda
Requires=dev-sda.device
After=dev-sda.device
[Mount]
What=/dev/sda
Where=/storage/disk1
Type=ext4
Options=rw,noatime,nofail
[Install]
WantedBy=multi-user.target
systemctl enable --now storage-disk1.mount
Download and Test MergerFS
MergerFS isn’t available as an add-on, but you can get it directly from the developer. LibreELEC (or CoreELEC) on ARM have a 32 bit[^2] user space so you’ll need the armhf version.
wget https://github.com/trapexit/mergerfs/releases/latest/download/mergerfs-static-linux_armhf.tar.gz
tar --extract --file=./mergerfs-static-linux_armhf.tar.gz --strip-components=3 usr/local/bin/mergerfs
mkdir bin
mv mergerfs bin/
Mount the drives and run a test like below. Notice the escaped *
. That’s needed at the command line to prevent shell globbing.
mkdir /storage/pool
/storage/bin/mergerfs /storage/disk\* /storage/pool/
Create the MergerFS Service
vi /storage/.config/system.d/mergerfs.service
[Unit]
Description = MergerFS Service
After=storage-disk1.mount storage-disk2.mount storage-disk3.mount storage-disk4.mount
Requires=storage-disk1.mount storage-disk2.mount storage-disk3.mount storage-disk4.mount
[Service]
Type=forking
ExecStart=/storage/bin/mergerfs -o category.create=mfs,noatime /storage/disk* /storage/pool/
ExecStop=umount /storage/pool
[Install]
WantedBy=default.target
systemctl enable --now mergerfs.service
Your content should now be available in /storage/pool
after boot.
2.1.1.6 - Remotes
Most remotes just work. Newer ones emulate a keyboard and send well-known multimedia keys like ‘play’ and ‘volume up’. If you want to change what a button does, you can tell Kodi what to do pretty easily. In addition, LibreELEC also supports older remotes using eventlircd
and popular ones are already configured. You can add unusual ones as well as get normal remotes to perform arbitrary actions when kodi isn’t running (like telling the computer to start kodi or shutdown cleanly).
Modern Remotes
If you plug in a remote receiver and the kernel makes reference to a keyboard you have a modern remote and Kodi will talk to it directly.
dmesg
input: BESCO KSL81P304 Keyboard as ...
hid-generic 0003:2571:4101.0001: input,hidraw0: USB HID v1.11 Keyboard ...
If you want to change a button action, put kodi into log mode, tail the logfile, and press the button in question to see what event is detected.
# Turn on debug
kodi-send -a toggledebug
# Tail the logfile
tail -f /storage/.kodi/temp/kodi.log
debug <general>: Keyboard: scancode: 0xac, sym: 0xac, unicode: 0x00, modifier: 0x0
debug <general>: HandleKey: browser_home (0xf0b6) pressed, window 10000, action is ActivateWindow(Home)
In this example, we pressed the ‘home’ button on the remote. That was detected as a keyboard press of the browser_home
key. This is just one of many defined keys like ’email’ and ‘calculator’ that can be present on a keyboard. Kodi has a default action of that and you can see what it is in the system keymap
# View the system keyboard map to see what's happening by default
cat /usr/share/kodi/system/keymaps/keyboard.xml
To change what happens, create a user keymap. Any entries in it will override the default.
# Create a user keymap that takes you to 'Videos' instead of 'Home'
vi /storage/.kodi/userdata/keymaps/keyboard.xml
<keymap>
<global>
<keyboard>
<browser_home>ActivateWindow(Videos)</browser_home>
</keyboard>
</global>
</keymap>
kodi-send -a reloadkeymaps
Legacy Remotes
How They Work
Some receivers don’t send well-known keys. For these, there’s eventlircd
. LibreELEC has a list of popular remotes that fall into this category and will dynamically use it as needed. For instance, pair an Amazon Fire TV remote and udev
will fire, match a rule in /usr/lib/udev/rules.d/98-eventlircd.rules
, and launch eventlircd
with the buttons mapped in /etc/eventlircd.d/aftvsremote.evmap
.
These will interface with Kodi using it’s “LIRC” (Linux Infrared Remote Contoll) interface. And just like with keyboards, there’s a set of well-known remote keys Kodi will accept. Some remotes don’t know about these so eventlircd
does some pre-translation before relaying to Kodi. If you look in the aftvsremote.evmap
file for example, you’ll see that KEY_HOMEPAGE = KEY_HOME
.
To find out if your remote falls into this category, enable logging, tail the log, and if your remote has been picked up for handling by eventlircd
you’ll see some entries like this.
debug <general>: LIRC: - NEW 66 0 KEY_HOME devinput (KEY_HOME)
debug <general>: HandleKey: percent (0x25) pressed, window 10000, action is PreviousMenu
In the first line, Kodi notes that it’s LIRC interface received a KEY_HOME button press. (Eventlircd
actually translated it, but that happened before kodi saw anything.) In the second line, Kodi says it received the key ‘percent’, and preformed the action ‘Back’. The part where Kodi says ‘percent (0x25)’ was pressed seems resistent to documentation, but the action of PreviousMenu is the end result. The main question is why?
Turns out that Kodi has a pre-mapping file for events relayed to it from LIRC systems. There’s a mapping for ‘KEY_HOME’ that kodi translates to the well-known key ‘start’. Then Kodi checks the normal keymap file and ‘start’ translates to the Kodi action ‘Back’
Take a look at the system LIRC mapping file to see for yourself.
# The Lircmap file has the Kodi well-known button (start) surrounding the original remote command (KEY_HOME)
grep KEY_HOME /usr/share/kodi/system/Lircmap.xml
<start>KEY_HOME</start>
Then take a look at the normal mapping file to see how start get’s handled
# The keymap file has the well-known Kodi button surrounding the Kodi action,
grep start /usr/share/kodi/system/keymaps/remote.xml
<start>PreviousMenu</start>
You’ll actually see quite a few things are mapped to ‘start’ as it does different things depending on what part of Kodi you are accessing at the time.
Changing Button Mappings
You have a few options an they are listed here in increasing complexity. Specifically, you can
- Edit the keymap
- Edit the Lircmap and keymap
- Edit the eventlircd evmap
Edit the Keymap
To change what the KEY_HOME button does you can create a user keymap like before and override it. It just needs a changed from keyboard to remote for entering through the LIRC interface. In this example we’ve set it to actually take you home via the kodi function ActivateWindow(Home).
vi /storage/.kodi/userdata/keymaps/remote.xml
<keymap>
<global>
<remote>
<start>ActivateWindow(Home)</start>
</remote>
</global>
</keymap>
Edit the Lircmap and Keymap
This can occasionally cause problems though - such as when you have another button that already gets translated to start and you want it to keep working the same. In this case, you make an edit at the Lircmap level to translate KEY_HOME to some other button first, then map that button to the action you want. (You can’t put the Kodi function above in the Lircmap file so you have to do a double hop.)
First, let’s determine what the device name should be with the irw
command.
irw
# Hit a button and the device name will be at the end
66 0 KEY_HOME devinput
Now let’s pick a key. My remote doesn’t have a ‘red’ key, so lets hijack that one. Note the device name devinput
from the above.
vi /storage/.kodi/userdata/Lircmap.xml
<lircmap>
<remote device="devinput">
<red>KEY_HOME</red>
</remote>
</lircmap>
Then map the key restart kodi (the keymap reload command doesn’t handle Lircmap)
vi /storage/.kodi/userdata/keymaps/remote.xml
<keymap>
<global>
<remote>
<red>ActivateWindow(Home)</red>
</remote>
</global>
</keymap>
systemctl restart kodi
Edit the Eventlircd Evmap
You can also change what evenlircd
does. If LibreELEC wasn’t a read-only filesystem you’d have done this first. But you can do it with a but more work than the above if you prefer.
# Copy the evmap files
cp -r /etc/eventlircd.d /storage/.config/
# Override where the daemon looks for it's configs
systemctl edit --full eventlircd
# change the ExecStart line to refer to the new location - add vvv to the end for more log info
ExecStart=/usr/sbin/eventlircd -f --evmap=/storage/.config/eventlircd.d --socket=/run/lirc/lircd -vvv
# Restart, replug the device and grep the logs to see what evmap is in use
systemctl restart eventlircd
journalctl | grep evmap
# Edit that map to change how home is mapped (yours may not use the default map)
vi /storage/.config/eventlircd.d/default.evmap
KEY_HOMEPAGE = KEY_HOME
Dealing With Unknown Buttons
Sometimes, you’ll have a button that does nothing at all.
debug <general>: LIRC: - NEW ac 0 KEY_HOMEPAGE devinput (KEY_HOMEPAGE)
debug <general>: HandleKey: 0 (0x0, obc255) pressed, window 10016, action is
In this example Kodi received the KEY_HOMEPAGE button, consulted it’s Lircmap.xml
and didn’t find anything. This is because eventlircd
didn’t recognize the remote and translate it to KEY_HOME like before. That’s OK, we can just add a user LIRC mapping. If you look through the system file you’ll see things like ‘KEY_HOME’ are tto the ‘start’ button. So let’s do the same.
vi /storage/.kodi/userdata/Lircmap.xml
<lircmap>
<remote device="devinput">
<start>KEY_HOMEPAGE</start>
</remote>
</lircmap>
systemctl restart kodi
Check the log and you’ll see that you now get
debug <general>: LIRC: - NEW ac 0 KEY_HOMEPAGE devinput (KEY_HOMEPAGE)
debug <general>: HandleKey: 251 (0xfb, obc4) pressed, window 10025, action is ActivateWindow(Home)
Remotes Outside Kodi
You may want a remote to work outside of kodi too - say because you want to start kodi with a remote button. If you have a modern remote that eventlircd
didn’t capture, you must first add your remote to the list of udev rules.
Capture The Remote
First you must identify the remote with lsusb
. It’s probably the only non-hub device listed.
lsusb
...
...
Bus 006 Device 002: ID 2571:4101 BESCO KSL81P304
^ ^
Vendor ID -------------/ \--------- Model ID
...
Then, copy the udev
rule file and add a custom rule for your remote.
cp /usr/lib/udev/rules.d/98-eventlircd.rules /storage/.config/udev.rules.d/
vi /storage/.config/udev.rules.d/98-eventlircd.rules
...
...
...
ENV{ID_USB_INTERFACES}=="", IMPORT{builtin}="usb_id"
# Add the rule under the above line so the USB IDs are available.
# change the numbers to match the ID from lsusb
ENV{ID_VENDOR_ID}=="2571", ENV{ID_MODEL_ID}=="4101", \
ENV{eventlircd_enable}="true", \
ENV{eventlircd_evmap}="default.evmap"
...
Now, reboot, turn on logging and see what the buttons show up as. You can also install the system tools
add-on in kodi, and at the command line, stop kodi
and the eventlircd
service, then run evtest
and press some buttons. You should see something like
Testing ... (interrupt to exit)
Event: time 1710468265.112925, type 4 (EV_MSC), code 4 (MSC_SCAN), value c0223
Event: time 1710468265.112925, type 1 (EV_KEY), code 172 (KEY_HOMEPAGE), value 1
Event: time 1710468265.112925, -------------- SYN_REPORT ------------
Event: time 1710468265.200987, type 4 (EV_MSC), code 4 (MSC_SCAN), value c0223
Event: time 1710468265.200987, type 1 (EV_KEY), code 172 (KEY_HOMEPAGE), value 0
Event: time 1710468265.200987, -------------- SYN_REPORT ------------
Configure and Enable irexec
Now that you have seen the event, you must have the irexec
process watching for it to take action. Luckily, LibreELEC already includes it.
vi /storage/.config/system.d/irexec.service
[Unit]
Description=IR Remote irexec config
After=eventlircd.service
Wants=eventlircd.service
[Service]
ExecStart=/usr/bin/irexec --daemon /storage/.lircrc
Type=forking
[Install]
WantedBy=multi-user.target
We’ll create a the config file next. The config is the command or script to run. systemctl start kodi
in our case.
vi /storage/.lircrc
begin
prog = irexec
button = KEY_HOMEPAGE
config = systemctl start kodi
end
Let’s enable and start it up
systemctl enable --now irexec
Go ahead and stop kodi, then press the KEY_HOMEPAGE button on your remote. Try config entries like echo start kodi > /storage/test-results
if you have issues and wonder if it’s running.
Notes
You may notice that eventlircd is always running, even if it has no remotes. That’s of a unit file is in /usr/lib/systemd/system/multi-user.target.wants/. I’m not sure of why this is the case when there is no remote in play.
2.2 - Signage
2.2.1 - Anthias (Screenly)
Overview
Anthias (AKA Screenly) is a simple, open-source digital signage system that runs well on a raspberry pi. When plugged into a monitor, it displays images, video or web sites in slideshow fashion. It’s managed directly though a web interface on the device and there are fleet and support options.
Preparation
Use the Raspberry Pi Imager to create a 64 bit Raspberry Pi OS Lite image. Select the gear icon at the bottom right to enable SSH, create a user, configure networking, and set the locale. Use SSH continue configuration.
setterm --cursor on
sudo raspi-config nonint do_change_locale en_US-UTF-8
sudo raspi-config nonint do_configure_keyboard us
sudo raspi-config nonint do_wifi_country US
sudo timedatectl set-timezone America/New_York
sudo raspi-config nonint do_hostname SOMENAME
sudo apt update;sudo apt upgrade -y
sudo reboot
Enable automatic updates and enable reboots
sudo apt -y install unattended-upgrades
# Remove the leading slashes from some of the updates and set to true
sudo sed -i 's/^\/\/\(.*origin=Debian.*\)/ \1/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Kernel-Packages \).*/ \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-New-Unused-Dependencies \).*/ \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Remove-Unused-Dependencies \).*/ \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
sudo sed -i 's/^\/\/\(Unattended-Upgrade::Automatic-Reboot \).*/ \1"true";/' /etc/apt/apt.conf.d/50unattended-upgrades
Installation
bash <(curl -sL https://www.screenly.io/install-ose.sh)
Operation
Adding Content
Navigate to the Web UI at the IP address of the device. You may wish to enter the settings and add authentication and change the device name.
You may add common graphic types, mp4, web and youtube links. It will let you know if it fails to download the youtube video. Some heavy web pages fail to render correctly, but most do.
Images must be sized to for the screen. In most cases this is 1080. Larger images are scaled down, but smaller images are not scaled up. For example, PowerPoint is often used to create slides, but it exports at 720. On a 1080 screen creates black boarders. You can change the resolution on the Pi with rasp-config
or add a registry key to Windows to change PowerPoint’s output size.
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\PowerPoint\Options]
"ExportBitmapResolution"=dword:00000096
Schedule the Screen
You may want to turn off the display during non-operation hours. The vcgencmd
command can turn off video output and some displays will choose to enter power-savings mode. Some displays misbehave or ignore the command, so testing is warranted.
sudo tee /etc/cron.d/screenpower << EOF
# m h dom mon dow usercommand
# Turn monitor on
30 7 * * 1-5 root /usr/bin/vcgencmd display_power 1
# Turn monitor off
30 19 * * 1-5 root /usr/bin/vcgencmd display_power 0
# Weekly Reboot just in case
0 7 * * 1 root /sbin/shutdown -r +10 "Monday reboot in 10 minutes"
EOF
Troubleshooting
YouTube Fail
You may find you must download the video manually and then upload to Anthias. Use the utility yt-dlp to list and then download the mp4 version of a video
yt-dlp --list-formats https://www.youtube.com/watch?v=YE7VzlLtp-4
yt-dlp --format 22 https://www.youtube.com/watch?v=YE7VzlLtp-4
WiFi Disconnect
WiFi can go up and down, and some variants of the OS do not automatically reconnect. You way want to add the following script to keep connected.
sudo touch /usr/local/bin/checkwifi
sudo chmod +x /usr/local/bin/checkwifi
sudo vim.tiny /usr/local/bin/checkwifi
#!/bin/bash
# Exit if WiFi isn't configured
grep -q ssid /etc/wpa_supplicant/wpa_supplicant.conf || exit
# In the case of multiple gateways (when connected to wired and wireless)
# the `grep -m 1` will exit on the first match, presumably the lowest metric
GATEWAY=$(ip route list | grep -m 1 default | awk '{print $3}')
ping -c4 $GATEWAY > /dev/null
if [ $? != 0 ]
then
logger checkwifi fail `date`
service wpa_supplicant restart
service dhcpcd restart
else
logger checkwifi success `date`
fi
sudo tee /etc/cron.d/checkwifi << EOF
# Check WiFi connection
*/5 * * * * /usr/bin/sudo -H /usr/local/bin/checkwifi >> /dev/null 2>&1"
EOF
Hidden WiFi
If you didn’t set up WiFi during imaging, you can use raspi-config
after boot, but you must add a line if it’s a hidden network, and reboot.
sudo sed -i '/psk/a\ scan_ssid=1' /etc/wpa_supplicant/wpa_supplicant.conf
Wrong IP on Splash Screen
This seems to be captured during installation and then resides statically in this file. Adjust as needed.
# You can turn off the splash screen in the GUI or in the .conf
sed -i 's/show_splash =.*/show_splash = off/' /home/pi/.screenly/screenly.conf
# Or you can correct it in the docker file
vi ./screenly/docker-compose.yml
White Screen or Hung
Anthias works best when the graphics are the correct size. It will attempt to display images that are too large, but this flashes a white screen and eventually hangs the box (at least in the current version). Not all users get the hang of sizing things correctly, so if you have issues, try this script.
#!/bin/bash
# If this device isn't running signage, exit
[ -d /home/pi/screenly_assets ] || { echo "No screenly image asset directory, exiting"; exit 1; }
# Check that mediainfo and imagemagick convert are available
command -v mediainfo || { echo "mediainfo command not available, exiting"; exit 1; }
command -v convert || { echo "imagemagick convert not available, exiting"; exit 1; }
cd /home/pi/screenly_assets
for FILE in *.png *.jpe *.gif
do
# if the file doesn't exist, skip this iteration
[ -f $FILE ] || continue
# Use mediainfo to get the dimensions at it's much faster than imagemagick
read -r NAME WIDTH HEIGHT <<<$(echo -n "$FILE ";mediainfo --Inform="Image;%Width% %Height%" $FILE)
# if it's too big, use imagemagick's convert. (the mogify command doesn't resize reliably)
if [ "$WIDTH" -gt "1920" ] || [ "$HEIGHT" -gt "1080" ]
then
echo $FILE $WIDTH x $HEIGHT
convert $FILE -resize 1920x1080 -gravity center $FILE
fi
done
No Video After Power Outage
If the display is off when you boot the pi, it may decide there is no monitor. When someone does turn on the display, there is no output. Enable hdmi_force_hotplug in the `/boot/config.txt`` to avoid this problem, and specify the group and mode to 1080 and 30hz.
sed -i 's/.*hdmi_force_hotplug.*/hdmi_force_hotplug=1/' /boot/config.txt
sed -i 's/.*hdmi_group=.*/hdmi_group=2/' /boot/config.txt
sed -i 's/.*hdmi_mode=.*/hdmi_mode=81/' /boot/config.txt
2.2.2 - Anthias Deployment
If you do regular deployments you can create an image. A reasonable approach is to:
- Shrink the last partition
- Zero fill the remaining free space
- Find the end of the last partition
- DD that to a file
- Use raspi-config to resize after deploying
Or you can use PiShrink to script all that.
Installation
wget https://raw.githubusercontent.com/Drewsif/PiShrink/master/pishrink.sh
chmod +x pishrink.sh
sudo mv pishrink.sh /usr/local/bin
Operation
# Capture and shrink the image
sudo dd if=/dev/mmcblk0 of=anthias-raw.img bs=1M
sudo pishrink.sh anthias-raw.img anthias.img
# Copy to a new card
sudo dd if=anthias.img of=/dev/mmcblk0 bs=1M
If you need to modify the image after creating it you can mount it via loop-back.
sudo losetup --find --partscan anthias.img
sudo mount /dev/loop0p2 /mnt/
# After you've made changes
sudo umount /mnt
sudo losetup --detach-all
Manual Steps
If you have access to a graphical desktop environment, use GParted. It will resize the filesystem and partitions for you quite easily.
# Mount the image via loopback and open it with GParted
sudo losetup --find --partscan anthias-raw.img
# Grab the right side of the last partition with your mouse and
# drag it as far to the left as you can, apply and exit
sudo gparted /dev/loop0
Now you need to find the last sector and truncate the file after that location. Since the truncate
utility operates on bytes, you convert sectors to bytes with multiplication.
# Find the End of the last partition. In the below example, it's Sector *9812664*
$ sudo fdisk -lu /dev/loop0
Units: sectors of 1 * 512 = 512 bytes
Device Boot Start End Sectors Size Id Type
/dev/loop0p1 8192 532479 524288 256M c W95 FAT32 (LBA)
/dev/loop0p2 532480 9812664 9280185 4.4G 83 Linux
sudo losetup --detach-all
sudo truncate --size=$[(9812664+1)*512] anthias-raw.img
Very Manual Steps
If you don’t have a GUI, you can do it with a combination of commands.
# Mount the image via loopback
sudo losetup --find --partscan anthias-raw.img
# Check and resize the file system
sudo e2fsck -f /dev/loop0p2
sudo resize2fs -M /dev/loop0p2
... The filesystem on /dev/loop0p2 is now 1149741 (4k) blocks long
# Now you can find the end of the resized filesystem by:
# Finding the number of sectors.
# Bytes = Num of blocks * block size
# Number of sectors = Bytes / sector size
echo $[(1149741*4096)/512]
# Finding the start sector (532480 in the example below)
sudo fdisk -lu /dev/loop0
Device Boot Start End Sectors Size Id Type
/dev/loop0p1 8192 532479 524288 256M c W95 FAT32 (LBA)
/dev/loop0p2 532480 31116287 30583808 14.6G 83 Linux
# Adding the number of sectors to the start sector. Add 1 because you want to end AFTER the end sector
echo $[532480 + 9197928 + 1]
# And resize the part to that end sector (ignore the warnings)
sudo parted resizepart 2 9730409
Great! Now you can follow the remainder of the GParted steps to find the new last sector and truncate the file.
Extra Credit
It’s handy to compress the image. xz
is pretty good for this
xz anthias-raw.img
xzcat anthias-raw.img | sudo dd of=/dev/mmcblk0
In these procedures, we make a copy of the SD card before we do anything. Another strategy is to resize the SD card directly, and then use dd
and read in X number of sectors rather than read it all in and then truncate it. A bit faster, if a but less recoverable from in the event of a mistake.
2.2.3 - API
The API docs on the web refer to screenly. Anthias uses an older API. However, you can access the API docs for the version your working with at
http://sign.your.domain/api/docs/
You’ll have to correct the swagger form with correct URL, but after that you can see what you’re working with.
3 - Monitoring
Time series vs event data.
3.1 - Metrics
3.1.1 - Prometheus
Overview
Prometheus is a time series database, meaning it’s optimized to store and work with data organized in time order. It includes in it’s single binary:
- Database engine
- Collector
- Simple web-based user interface
This allows you to collect and manage data with fewer tools and less complexity than other solutions.
Data Collection
End-points normally expose metrics to Prometheus by making a web page available that it can poll. This is done by including a instrumentation library (provided by Prometheus) or simply adding a listener on a high-level port that spits out some text when asked.
For systems that don’t support Prometheus natively, there are a few add-on services to translate. These are called ’exporters’ and translate things such as SNMP into a web format Prometheus can ingest.
Alerting
You can also alert on the data collected. This is through the Alert Manager, a second package that works closely with Prometheus.
Visualization
You still need a dashboard tool like Grafana to handle visualizations, but you can get started quite quickly with just Prometheus.
3.1.1.1 - Installation
Install from the Debian Testing repo, as stable can be up to a year behind.
# Testing
echo 'deb http://deb.debian.org/debian testing main' | sudo tee -a /etc/apt/sources.list.d/testing.list
# Pin testing down to a low level so the rest of your packages don't get upgraded
sudo tee -a /etc/apt/preferences.d/not-testing << EOF
Package: *
Pin: release a=testing
Pin-Priority: 50
EOF
# Living Dangerously with test
sudo apt update
sudo apt install -t testing prometheus
Configuration
Use this for your starting config.
cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]
This says every 15 seconds, run down the job list. And there is one job - to check out the system at ’localhost:9090’ which happens to be itself.
For every target listed, the scraper makes a web request for /metrics/ and stores the results. It ingests all the data presented and stores them for 15 days. You can choose to ignore certain elements or retain differently by adding config, but by default it takes everything given.
You can see this yourself by just asking like Prometheus would. Hit it up directly in your browser. For example, Prometheus is making metrics available at /metrics
http://some.server:9090/metrics
Operation
User Interface
You can access the Web UI at:
At the top, select Graph (you should be there already) and in the Console tab click the dropdown labeled “insert metric at cursor”. There you will see all the data being exposed. This is mostly about the GO language it’s written in, and not super interesting. A simple Graph tab is available as well.
Data Composition
Data can be simple, like:
go_gc_duration_seconds_sum 3
Or it can be dimensional which is accomplished by adding labels. In the example below, the value of go_gc_duration_seconds has 5 labeled sub-sets.
go_gc_duration_seconds{quantile="0"} 4.5697e-05
go_gc_duration_seconds{quantile="0.25"} 7.814e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103396
go_gc_duration_seconds{quantile="0.75"} 0.000143687
go_gc_duration_seconds{quantile="1"} 0.001030941
In this example, the value of net_conntrack_dialer_conn_failed_total has several.
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="default",reason="unknown"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="snmp",reason="unknown"} 0
How is this useful? It allows you to do aggregations - such as looking at all the net_contrack failures, and also look at the failures that were specifically refused. All with the same data.
Removing Data
You may have a target you want to remove. Such as a typo hostname that is now causing a large red bar on a dashboard. You can remove that mistake by enabling the admin API and issuing a delete
sudo sed -i 's/^ARGS.*/ARGS="--web.enable-admin-api"/' /etc/default/prometheus
sudo systemctl reload prometheus
curl -s -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="badhost.some.org:9100"}'
The default retention is 15 days. You may want less than that and you can configure --storage.tsdb.retention.time=1d
similar to above. ALL data has the same retention, however. If you want historical data you must have a separate instance or use VictoriaMetrics.
Next Steps
Let’s get something interesting to see by adding some OS metrics
Troubleshooting
If you can’t start the prometheus server, it may be an issue with the init file. Test and Prod repos use different defaults. Add some values explicitly to get new versions running
sudo vi /etc/default/prometheus
ARGS="--config.file="/etc/prometheus/prometheus.yml --storage.tsdb.path="/var/lib/prometheus/metrics2/"
3.1.1.2 - Node Exporter
This is a service you install on your end-points that make CPU/Memory/Etc. metrics available to Prometheus.
Installation
On each device you want to monitor, install the node exporter with this command.
sudo apt install prometheus-node-exporter
Do a quick test to make sure it’s responding to scrapes.
curl localhost:9100/metrics
Configuration
Back on your Prometheus server, add these new nodes as a job in the prometheus.yaml
file. Feel free to drop the initial job where Prometheus was scraping itself.
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'servers'
static_configs:
- targets:
- some.server:9100
- some.other.server:9100
- and.so.on:9100
sudo systemctl reload prometheus.service
Operation
You can check the status of your new targets at:
http://some.server:9090/classic/targets
A lot of data is collected by default. On some low power systems you may want less. For just the basics, customize the the config to disable the defaults and only enable specific collectors.
In the case below we are reduce collection to just CPU, Memory, and Hardware metrics. When scraping a Pi 3B, this reduces the Scrape Duration from 500 to 50ms.
sudo sed -i 's/^ARGS.*/ARGS="--collector.disable-defaults --collector.hwmon --collector.cpu --collector.meminfo"/' /etc/default/prometheus-node-exporter
sudo systemctl restart prometheus-node-exporter
The available collectors are listed on the page:
3.1.1.3 - SNMP Exporter
SNMP is one of the most prevalent (and clunky) protocols still widely used on network-attached devices. But it’s a good general-purpose way to get data from lots of different makes of products in a similar way.
But Prometheus doesn’t understand SNMP. The solution is a translation service that acts a a middle-man and ’exports’ data from those devices in a way Prometheus can.
Installation
Assuming you’ve already installed Prometheus, install some SNMP tools and the exporter. If you have an error installing the mibs-downloader, check troubleshooting at the bottom.
sudo apt install snmp snmp-mibs-downloader
sudo apt install -t testing prometheus-snmp-exporter
Change the SNMP tools config file to allow use of installed MIBs.
sudo sed -i 's/^mibs/# &/' /etc/snmp/snmp.conf
Preparation
We need a target, so assuming you have a switch somewhere and can enable SNMP on it, let’s query the switch for its name, AKA sysName. Here we’re using version “2c” of the protocol with the read-only password “public”. Pretty standard.
# Note: app
snmpwalk -v 2c -c public some.switch.address sysName
SNMPv2-MIB::sysName.0 = STRING: Some-Switch
Note: If you get back an error or just the ‘iso’ prefixed value, double check your MIBs are installed.
Configuration
To add this switch to the Prometheus scraper, add a new job to the prometheus.yaml
file. This job will include the targets as normal, but also the path (since it’s different than default) and an optional parameter called module that specific to the SNMP exporter. It also does something confusing - a relabel_config
This is because Prometheus isn’t actually taking to the switch, it’s talking to the local SNMP exporter service. So we put all the targets normally, and then at the bottom ‘oh, by the way, do a switcheroo’. This allows Prometheus to display all the data normally with no one the wiser.
...
...
scrape_configs:
- job_name: 'snmp'
static_configs:
- targets:
- some.switch.address
metrics_path: /snmp
params:
module: [if_mib]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # The SNMP exporter's real hostname:port.
Operation
No configuration on the exporter side is needed. Reload the config and check the target list. Then examine data in the graph section. Add additional targets as needed and the exporter will pull in the data.
http://some.server:9090/classic/targets
These metrics are considered well known and so will appear in the database named sysUpTime and upsBasicBatteryStatus and not be prefixed with snmp_ like you might expect.
Next Steps
If you have something non-standard, or you simply don’t want that huge amount of data in your system, look at the link below to customize the SNMP collection with the Generator.
SNMP Exporter Generator Customization
Troubleshooting
The snmp-mibs-downloader is just a handy way to download a bunch of default MIBs so when you use the tools, all the cryptic numbers, like “1.3.6.1.2.1.17.4.3.1” are translated into pleasant names.
If you can’t find the mibs-downloader its probably because it’s in the non-free repo and that’s not enabled by default. Change your apt sources file like so
sudo vi /etc/apt/sources.list
deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free
deb http://security.debian.org/debian-security bullseye-security main contrib non-free
deb-src http://security.debian.org/debian-security bullseye-security main contrib non-free
deb http://deb.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye-updates main contrib non-free
It may be that you only need to change one line.
3.1.1.4 - SNMP Generator
Installation
There is no need to install the Generator as it comes with the SNMP exporter. But if you have a device that supplies it’s own MIB (and many do), you should add that to the default location.
# Mibs are often named SOMETHING-MIB.txt
sudo cp -n *MIB.txt /usr/share/snmp/mibs/
Preparation
You must identify the values you want to capture. Using snmpwalk
is a good way to see what’s available, but it helps to have a little context.
The data is arranged like a folder structure that you drill-down though. The folder names are all numeric, with ‘.’ instead of slashes. So if you wanted to get a device’s sysName you’d click down through 1.3.6.1.2.1.1.5 and look in the file 0.
When you use snmpwalk
it starts wherever you tell it and then starts drilling-down, printing out everything it finds.
How do you know that’s where sysName is at? A bunch of folks got together (the ISO folks) and decided everything in advance. Then they made some handy files (MIBs) and passed them out so you didn’t have to remember all the numbers.
They allow vendors to create their own sections as well, for things that might not fit anywhere else.
A good place to start is looking at what the vendor made available. You see this by walking their section and including their MIB so you get descriptive names - only the ISO System MIB is included by default.
# The SysobjectID identifies the vendor section
# Note use of the MIB name without the .txt
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address SysobjectID
SNMPv2-MIB::sysObjectID.0 = OID: SOMEVENDOR-MIB::somevendoramerica
# Then walk the vendor section using the name from above
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c some.address somevendoramerica
SOMEVENDOR-MIB::model.0 = STRING: SOME-MODEL
SOMEVENDOR-MIB::power.0 = INTEGER: 0
...
...
# Also check out the general System section
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.address system
# You can also walk the whole ISO tree. In some cases,
# there are thousands of entries and it's indeciperable
$ snmpwalk -m +SOMEVENDOR-MIB -v 2c -c public some.system iso
This can be a lot of information and you’ll need to do some homework to see what data you want to collect.
Configuration
The exporter’s default configuration file is snmp.yml
and contains about 57 Thousand lines of config. It’s designed to pull data from whatever you point it at. Basically, it doesn’t know what device it’s talking to, so it tries to cover all the bases.
This isn’t a file you should edit by hand. Instead, you create instructions for the generator and it look though the MIBs and create one for you. Here’s an example for a Samlex Invertor.
vim ~/generator.yml
modules:
samlex:
walk:
- sysLocation
- inverterMode
- power
- vin
- tempDD
- tempDA
prometheus-snmp-generator generate
sudo cp /etc/prometheus/snmp.yml /etc/prometheus/snmp.yml.orig
sudo cp ~/snmp.yml /etc/prometheus
sudo systemctl reload prometheus-snmp-exporter.service
Configuration in Prometheus remains the same - but since we picked a new module name we need to adjust that.
...
...
params:
module: [samlex]
...
...
sudo systemctl reload prometheus.service
Adding Data Prefixes
by default, the names are all over the place. The SNMP Exporter Devs leave it this way because there are a lot of pre-built dashboards on downstream systems that expect the existing names.
If you are building your own downstream systems you can prefix (as is best-practice) as you like with a post generation step. This example cases them all to be prefixed with samlex_.
prometheus-snmp-generator generate
sed -i 's/name: /name: samlex_/' snmp.yml
Combining MIBs
You can combine multiple systems in the generator file to create one snmp.yml file, and refer to them by the module name in the Prometheus file.
modules:
samlex:
walk:
- sysLocation
- inverterMode
- power
- vin
- tempDD
- tempDA
ubiquiti:
walk:
- something
- somethingElse
Operation
As before, you can get a preview directly from the exporter (using a link like below). This data should show up in the Web UI too.
http://some.server:9116/snmp?module=samlex&target=some.device
Sources
https://github.com/prometheus/snmp_exporter/tree/main/generator
3.2 - Logs
3.3 - Visualization
3.3.1 - Grafana
4 - Network
4.1 - Routing
4.1.1 - Linux Router
Creating a Linux router is fairly simple. Some distros like Alpine Linux are well suited for it but any will do. I used Debian in this example.
Install the base OS without a desktop system. Assuming you have two network interfaces, pick one to be the LAN interface (traditionally the first one, eth0 or such) and set the address statically.
Basic Routing
To route, all you really need do is enable forwarding.
# as root
# enable
sysctl -w net.ipv4.ip_forward=1
# and persist
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
Private Range
If one side is a private network, such as in the 192.168
, you probably need to masquerade. This assumes you already have nftables
installed and it’s default rules in /etc/nftables.conf
# As root
# Add the firewall rules to masquerade
nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade
# Persist the rules and enable the firewall
nft list ruleset >> /etc/nftables.conf
systemctl enable --now nftables.service
DNS and DHCP
If you want to provide network services such as DHCP and DNS, you can add dnsmasq
sudo apt install dnsmasq
Assuming the LAN interface is named eth0
and set to 192.168.0.1
.
vi /etc/dnsmasq.d/netboot.conf
interface=eth0
dhcp-range=192.0.1.100,192.0.1.200,12h
dhcp-option=option:router,192.168.0.1
dhcp-authoritative
systemctl enable --now nftables.service
Firewall
You may want to add some firewall rules too.
# allow SSH from the lan interface
sudo nft add rule inet filter input iifname eth0 tcp dport ssh accept
# allow DNS and DHCP from the lan interface
sudo nft add rule inet filter input iifname eth0 tcp dport domain accept
sudo nft add rule inet filter input iifname eth0 udp dport {domain, bootps}
# Change the default input policy to drop
sudo nft add chain inet filter input {type filter hook input priority 0\; policy drop\;}
You can fine-tune these a bit more with the nft example.
4.1.2 - OPNsense
10G Speeds
When you set an OPNsense system up with supported 10G cards, say the Intel X540-AT2, you can move 6 to 8 Gb a second. Though this is better than in the past, but not line speed.
# iperf between two systems routed through a dial NIC on OPNsense
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-10.0040 sec 8.04 GBytes 6.90 Gbits/sec
This is because the packet filter is getting involved. If you disable that you’ll get closer to line speeds
Firewall –> Settings –> Advanced –> Disable Firewall
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-10.0067 sec 11.0 GBytes 9.40 Gbits/sec
4.2 - VPN
4.2.1 - Wireguard
Wireguard is a new, light-weight VPN that is both faster and simpler than its predecessors. With a small code-base and modern cryptography, it’s the future of VPNs.
Concepts
Wireguard is a layer 3 VPN and as such, only works with IPv4/6. It doesn’t provide DHCP, bridging, or other low-level features.
Participants authenticate using public-key cryptography, use UDP as a transport and do not respond to unauthenticated connection attempts.
Every participant is considered a peer. Each defines their own IP address, routing rules, and decides from whom they will accept traffic. Every peer must exchange public keys with every other other peer. There is no central authority.
Traffic is sent directly between configured peers but can also be relayed through central nodes if so configured by routing rules on the participants.
Scenarios
The way you deploy depends on what you’re doing, but in general you’ll either connect directly point-to-point or create a central server for remote access or management.
Central Server and Remote Access
This is the classic setup where remote systems connect to the network through one central point. Configure a wireguard server as that central point and then your clients (remote peers) to connect.
Central Server and Remote Management
Another common use is to have a fleet of devices ‘phone-home’ so you can reach them easily.
Point to Point
You can also have peers talk directly to each other. This is often used with routers to connect networks across the internet.
4.2.1.1 - Central Server
A central server gives remote devices a reachable target, allowing them to traverse firewalls and NAT and connect. Let’s create a server and generate and add your first remote peer.
Preparation
You’ll need:
- Public Domain Name or Static IP
- Linux Server
- Ability to port-forward UDP 51830
A dynamic domain name will work and it’s reasonably priced (usually free). You just need something for the peers to connect to, though a static IP is best. You can possibly break connectivity if your IP changes while your peers are connected or have the old IP cached.
We use Debian in this example and derivatives should be similar. UDP 51820 is the standard port but you can choose another if desired.
You must also choose a VPN network that doesn’t overlap with your existing networks. We use 192.168.100.0/24 in this example.
Installation
sudo apt install wireguard-tools
Configuration
All the server needs is a single config file and it will look something like this:
[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = sGp9lWqfBx+uOZO8V5NPUlHQ4pwbvebg8xnfOgR00Gw=
We picked .1
as our server address (pretty standard), created a private key with the wg
tool, and put that in the file /etc/wireguard/wg0.conf
. Here’s the commands to do that.
# As root
cd /etc/wireguard/
umask 077
wg genkey > server_privatekey
wg pubkey < server_privatekey > server_publickey
read PRIV < server_privatekey
cat << EOF > wg0.conf
[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = $PRIV
EOF
Operation
The VPN operates by creating network interface and loading a kernel module. You use the linux ip
command to add a network interface of type wireguard (that automatically loads the kernel module) or use the wg-quick
command do do it for you. Name the interface wg0
and it will pull in the config wg0.conf
Test the Interface
wg-quick up wg0
ping 192.168.100.1
wg-quick down wg0
Enable The Service
For normal use, employ systemctl to create a service using the installed service file.
systemctl enable --now wg-quick@wg0
Administration
The most common procedure is adding new clients. Each must have a unique key and IP, as the keys are hashed and used as part of the internal routing.
Create a Client
Let’s create a client config file by generating a key and assigning them an IP. It’s not secure, but it is pragmatic.
wg genkey > client_privatekey # Generates and saves the client private key
wg pubkey < client_privatekey # Displays the client's public key
Add the client’s public key and IP to your server’s wg0.conf
and reload. For the IP, it’s fine to just increment. Note the /32, meaning we will only accept that IP from this peer.
[Interface]
Address = 192.168.100.1/24
ListenPort = 51820
PrivateKey = XXXXXX
## Some Client ##
[Peer]
PublicKey = XXXXXX
AllowedIPs = 192.168.100.2/32
wg-quick down wg0 && wg-quick up wg0
Send The Client Config
A client config file should look similar to this. The [Interface] is about the client and the [Peer] is about the server.
[Interface]
PrivateKey = THE-CLIENT-PRIVATE-KEY
Address = 192.168.100.2/32
[Peer]
PublicKey = YOUR-SERVERS-PUBLIC-KEY
AllowedIPs = 192.168.100.0/24
Endpoint = your.server.org:51820
Put in the keys and domain name, zip it up and send it on to your client as securely as possible. One neat trick is to display a QR code right in the shell. Devices that have a camera can import from that.
qrencode -t ANSIUTF8 < client-wg0.conf
Test The Client
You should be able to ping the server from the client. If not, take a look at the troubleshooting steps.
Next Steps
We haven’t enabled forwarding yet or set up firewall rules as those depend on what role your central peer will play. Proceed on to Remote Access or Remote Management as desired.
Troubleshooting
When something is wrong, you don’t get an error message, you just get nothing. You bring up the client interface but you can’t ping the server 192.168.100.1. But you can turn on log messages on the server with this command.
echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg
# When done, send a '-p'
Key Errors
wg0: Invalid handshake initiation from 205.133.134.15:18595
In this case, you should check your keys and possibly take the server interface down and up.
Typeos
ifconfig: ioctl 0x8913 failed: No such device
Check your conf is named /etc/wireguard/wg0.conf
and look for any typoes.
Firewall Issues
If you see no wireguard error messages, you should suspect your firewall. Since it’s UDP you can’t test the port directly, but you can use netcat.
nc -ulp 51820 # On the server
nc -u some.server 51820 # On the client. Type and see if it shows up on the server
4.2.1.2 - Remote Access
This is the classic setup where remote peers initiate a connection to the central peer through the internet. That central system forwards their traffic onward to the corporate network.
Traffic Handling
The main choice is route or masquerade .
Routing
If you route, the client’s VPN IP address is what other devices see. This is generally preferred as it allows you to log who was doing what at the individual servers. But you must update your network equipment to treat the central server as a router.
Masquerading
Masquerading causes the server to translate all the traffic. This makes everything look like its coming from the server. It’s less secure, but less complicated and much quicker to implement.
For this example, we will masquerade traffic from the server.
Central Server Config
Enable Masquerade
Use sysctl
to enable forwarding on the server and nft
to add masquerade.
# as root
sysctl -w net.ipv4.ip_forward=1
nft flush ruleset
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100\; }
nft add rule nat postrouting masquerade
Persist Changes
It’s best if we add our new rules onto the defaults and enable the nftables service.
# as root
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
nft list ruleset >> /etc/nftables.conf
systemctl enable --now nftables.service
Client Config
Your remote peer - the one you created when setting up the server - needs it’s AllowedIPs
adjusted so it knows to send more traffic through the tunnel.
Full Tunnel
This sends all traffic from the client over the VPN.
AllowedIPs = 0.0.0.0/0
Split Tunnel
The most common config is to send specific networks through the tunnel. This keeps netflix and such off the VPN
AllowedIPs = 192.168.100.0/24, 192.168.XXX.XXX, 192.168.XXX.YYY
DNS
In some cases, you’ll need the client to use your internal DNS server to resolve private domain names. Make sure this server is in the AllowedIPs above.
[Interface]
PrivateKey = ....
Address = ...
DNS = 192.168.100.1
Access Control
Limit Peer Access
By default, everything is open and all the peers can talk to each other and the internet at large - even NetFlix! (they can edit their side of the connection at will). So let’s add some rules to the default filter table.
This example prevents peers from from talking to each other but let’s them ping the central server and reach the corporate network.
# Load the base config in case you haven't arleady. This includes the filter table
sudo nft -f /etc/nftables.conf
# Reject any traffic being sent outside the 192.168.100.0/24
sudo nft add rule inet filter forward iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp type admin-prohibited
# Reject any traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
Grant Admin Access
You may want to add an exception for one of the addresses so that an administrator can interact with the remote peers. Order matters, so add it before before the other rules above
sudo nft -f /etc/nftables.conf
# Allow an special 'admin' peer full access and others to reply
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept
# As above
...
...
Save Changes
Since this change is a little more complex, we’ll replace the existing file config file and add notes.
sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0
}
chain forward {
type filter hook forward priority 0
# Accept admin traffic and responses
iifname "wg0" ip saddr 192.168.100.2 accept
iifname "wg0" ct state {established, related} accept
# Reject other traffic between peers
iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
# Reject traffic outside the desired network
iifname "wg0" ip daddr != 192.168.100.0/24 reject with icmp admin-prohibited
}
chain output {
type filter hook output priority 0
}
}
table ip nat {
chain postrouting {
type nat hook postrouting priority srcnat
masquerade
}
}
Note: The syntax of the file is slightly different than the command. You can use nft list ruleset
to see how nft config and commands translate into running rules. For example - the policy accept
is being appended. You may want to experiment with explicitly adding policy drop
.
The forwarding chain is where routing type rules go (the input chain is traffic sent to the host itself). Prerouting might work as well, though it’s less common and not present by default.
Notes
The default nftable config file in Debian is:
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority filter;
}
chain forward {
type filter hook forward priority filter;
}
chain output {
type filter hook output priority filter;
}
}
If you have old iptables rules you want to translate to nft, you can install iptables and add them (they get translated on the fly into nft) and nft list ruleset
to see how to they turn out.
4.2.1.3 - Remote Mgmt
In this scenario peers initiate connections to the cerntral server, making their way through NAT and Firewalls, but you don’t want to forward their traffic.
Central Server Config
No forwarding or masquerade is desired, so there is no additional configuration to the central server.
Client Config
The remote peer - the one you created when setting up the server - is already set up with one exception; a keep-alive.
When the remote peer establishes it’s connection to the central server, intervening firewalls allow you to talk back as they assume it’s in response. However, the firewall will eventually ‘close’ this window unless the client continues sending traffic occasionally to ‘keep alive’ the connection.
# Add this to the bottom of your client's conf file
PersistentKeepalive = 20
Firewall Rules
You should apply some controls to your clients to prevent them from talking to each other (and possibly the server and you also need a rule for the admin station. You can do this by adding rules to the forward chain.
# Allow an 'admin' peer at .2 full access to others and accept their replies
sudo nft add rule inet filter forward iifname "wg0" ip saddr 192.168.100.2 accept
sudo nft add rule inet filter forward ct state {established, related} accept
# Reject any other traffic between peers
sudo nft add rule inet filter forward iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
You can persist this change by editing your /etc/nftables.conf
file to look like this.
sudo vi /etc/nftables.conf
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0;
}
chain forward {
type filter hook forward priority 0;
# Accept admin traffic
iifname "wg0" ip saddr 192.168.100.2 accept
iifname "wg0" ct state {established, related} accept
# Reject other traffic between peers
iifname "wg0" oifname "wg0" reject with icmp type admin-prohibited
}
chain output {
type filter hook output priority 0;
}
}
table ip nat {
chain postrouting {
type nat hook postrouting priority srcnat; policy accept;
masquerade
}
}
4.2.1.4 - Routing
Rather than masquerade, your wireguard server can forward traffic with the VPN addresses intact. You must handle that on your network in one of the following ways.
Symmetric Routing
Classically, you’d treat the wireguard server like any other router. You’d create a management interface and/or a routing interface and advertise routes appropriately.
On a small network, you would simply overlay an additional IP range on top of the existing on by adding a second IP address on your router and put your wireguard server on that network. Your local servers will see the VPN addressed clients and send traffic to the router that will pass it to the wireguard server.
Asymmetric Routing
In a small network you might have the central peer on the same network as the other servers. In this case, it will be acting like a router and forwarding traffic, but the other servers won’t know about it and so will send replies back to their default gateway.
To remedy this, add a static route at the gateway for the VPN range that sends traffic back to the central peer. Asymmetry is generally frowned upon, but it gets the job done with one less hop.
Host Static Routing
You can also configure the servers in question with a static route for VPN traffic so they know to send it directly back to the Wireguard server. This is fastest but you have to visit every host. Though you can use DHCP to distribute this route in some cases.
4.2.1.5 - LibreELEC
LibreELEC and CoreELEC are Linux-based open source software appliances for running the Kodi media player. These can be used as kiosk displays and you can remotely manage them with wireguard.
Create a Wireguard Service
These systems have wireguard support, but use connman
that lacks split-tunnel ability1. This forces all traffic through the VPN and so is unsuitable for remote management. To enable split-tunnel, create a wireguard service instead.
Create a service unit file
vi /storage/.config/system.d/wg0.service
[Unit]
Description=start wireguard interface
# The network-online service isn't guaranteed to work on *ELEC
#Requires=network-online.service
After=time-sync.target
Before=kodi.service
[Service]
Type=oneshot
RemainAfterExit=true
StandardOutput=journal
# Need to check DNS is responding before we proceed
ExecStartPre=/bin/bash -c 'until nslookup google.com; do sleep 1; done'
ExecStart=ip link add dev wg0 type wireguard
ExecStart=ip address add dev wg0 10.1.1.3/24
ExecStart=wg setconf wg0 /storage/.config/wireguard/wg0.conf
ExecStart=ip link set up dev wg0
# On the newest version, a manual route addition is needed too
ExecStart=ip route add 10.2.2.0/24 dev wg0 scope link src 10.1.1.3
# Deleting the device seems to remove the address and routes
ExecStop=ip link del dev wg0
[Install]
WantedBy=multi-user.target
Create a Wireguard Config File
Note: This isn’t exactly the same file wg-quick uses, just close enough to confuse.
vi /storage/.config/wireguard/wg0.conf
[Interface]
PrivateKey = XXXXXXXXXXXXXXX
[Peer]
PublicKey = XXXXXXXXXXXXXXX
AllowedIPs = 10.1.1.0/24
Endpoint = endpoint.hostname:31194
PersistentKeepalive = 25
Enable and Test
systemctl enable --now wg0.service
ping 10.1.1.1
Create a Cron Check
When using a DNS name for the endpoint you may become disconnected. To catch this, use a cron job
# Use the internal wireguard IP address of the peer you are connecting to. .1 in this case
crontab -e
*/5 * * * * ping -c1 -W5 10.1.1.1 || ( systemctl stop wg0; sleep 5; systemctl start wg0 )
4.2.1.6 - TrueNAS Scale
You can directly bring up a Wireguard interface in TrueNAS Scale, and use that to remotely manage it.
Wireguard isn’t exposed in the GUI, so use the command line to create a config file and enable the service. To make it persistent between upgrades, add a cronjob to restore the config.
Configuration
Add a basic peer as when setting up a Central Server and save the file on the client as /etc/wireguard/wg1.conf
. It’s rumored that wg0 is reserved for the TrueNAS cloud service. Once the config is in place, use wg-quick up wg1
command to test and enable as below.
nano /etc/wireguard/wg1.conf
systemctl enable --now wg-quick@wg1
If you use a domain name in this conf for the other side, this service will fail at boot because DNS isn’t up and it’s not easy to get it to wait. So add a pre-start to the service file to specifically test name resolution.
vi /lib/systemd/system/[email protected]
[Service]
...
...
ExecStartPre=/bin/bash -c 'until host google.com; do sleep 1; done'
Note: Don’t include a DNS server in your wireguard settings or everything on the NAS will attempt to use your remote DNS and fail if the link goes down.
Accessing Apps
When deploying an app, click the enable “Host Network” or “Configure Host Network” box in the apps config and you should be able to access via the VPN address. On Cobia (23.10) at least. If that fails, you can add a command like this to a post-start in the wireguard config file.
iptables -t nat -A PREROUTING --dst 192.168.100.2 -p tcp --dport 20910 -j DNAT --to-destination ACTUAL.LAN.IP:20910
Detecting IP Changes
The other side of your connection may dynamic address and wireguard wont know about it. A simple solution is a cron job that pings the other side periodically, and if it fails, restarts the interface. This will lookup the domain name again and hopefully find the new address.
touch /etc/cron.hourly/wg_test
chmod +x /etc/cron.hourly/wg_test
vi /etc/cron.hourly/wg_test
#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( wg-quick down wg1 ; wg-quick up wg1 )
Troubleshooting
Cronjob Fails
cronjob kills interface when it can’t ping
or
/usr/local/bin/wg-quick: line 32: resolvconf: command not found
Calling wg-quick via cron causes a resolvconf issue, even though it works at the command line. One solution is to remove any DNS config from your wg conf file so it doesn’t try to register the remote DNS server.
Nov 08 08:23:59 truenas wg-quick[2668]: Name or service not known: `some.server.org:port' Nov 08 08:23:59 truenas wg-quick[2668]: Configuration parsing error … Nov 08 08:23:59 truenas systemd[1]: Failed to start WireGuard via wg-quick(8) for wg1.
The DNS service isn’t available (yet), despite Requires=network-online.target nss-lookup.target
already in the service unit file. One way to solve this is a pre-exec in the Service section of the unit file1. This is hacky, but none of the normal directives work.
The cron job above will bring the service up eventually, but it’s nice to have it at boot.
Upgrade Kills Connection
An upgrade comes with a new OS image and that replaces anything you’ve added, such as wireguard config and cronjobs. The only way to persist your Wireguard connection it to put a script on the pool and add a cronjob via the official interface2.
Add this script and change for your pool location. This is set to run every 5 min, as you probably don’t want to wait after an upgrade very long to see if it’s working. You can also use this to detect IP changes over the cron.hourly above.
# Create the location and prepare the files
mkdir /mnt/pool02/bin/
cp /etc/wireguard/wg1.conf /mnt/pool02/bin/
touch /mnt/pool02/bin/wg_test
chmod +x /mnt/pool02/bin/wg_test
# Edit the script
vi /mnt/pool02/bin/wg_test
#!/bin/sh
ping -c1 -W5 192.168.100.1 || ( cp /mnt/pool02/bin/wg1.conf /etc/wireguard/ ; wg-quick down wg1 ; wg-quick up wg1 )
# Invoke the TrueNAS CLI and add the job
cli
task cron_job create command="/mnt/pool02/bin/wg_test" enabled=true description="test" user=root schedule={"minute": "*/5", "hour": "*", "dom": "*", "month": "*", "dow": "*"}
Notes
https://www.truenas.com/docs/core/coretutorials/network/wireguard/ https://www.truenas.com/community/threads/no-internet-connection-with-wireguard-on-truenas-scale-21-06-beta-1.94843/#post-693601
4.2.1.7 - Proxmox
Proxmox is frequently used in smaller enironments for it’s ability to mix Linux Containers and Virtual Machines at very low cost. LCD - Linux Containers - are especially valuable as they give the bennifits of virtualization with minimal overhead.
Using wireguard in a container simply requires adding the host’s kernel module interface.
Edit the container’s config
On the pve host, for lxc id 101:
echo "lxc.mount.entry = /dev/net/tun /dev/net/tun none bind create=file" >> /etc/pve/lxc/101.conf
Older Proxmox
In the past you had to install the module, or use the DKMS method. That’s no longer needed as the Wireguard kernel module is now available on proxmox with the standard install. You don’t even need to install the wireguard tools. But if you run into trouble you can go through these steps
apt install wireguard
modprobe wireguard
# The module will load dynamically when a conainter starts, but you can also manually load it
echo "wireguard" >> /etc/modules-load.d/modules.conf
5 - Operating Systems
5.1 - NetBoot
Most computers come with ‘firmware’. This is a built-in mini OS, embedded in the chips, that’s just smart enough to start things up and hand-off to something more capable.
That more-capable thing is usually an Operating System on a disk, but it can also be something over the network. This lets you:
- Run an OS installer, such as when you don’t have one installed yet.
- Run an the whole OS remotely without having a local disk at all.
PXE
The original way was Intel’s PXE (Preboot eXecution Environment) Option ROM on their network cards. The IBM PC firmware (BIOS) would would turn over execution to it and PXE would use basic network drivers to get on the network.
HTTP Boot
Modern machines have newer firmware (UEFI) and it includes logic on how to use HTTP/S without the need for add-ons. This simplifies thigns and also solves potential man-in-the middle attacks. Both methods are still generally called PXE booting, though.
Building a NetBoot Environment
Start by setting up a HTTP Boot system, then add PXE Booting and netboot.xyz to it. This gets you an installation system. Then proceed to diskless stations.
5.1.1 - HTTP Boot
We’ll set up a PXE Proxy server that runs DHCP and HTTP. This server and can be used along side your existing DHCP/DNS servers. We use Debian in this example but anything that runs dnsmasq
should work.
Installation
sudo apt install dnsmasq lighttpd
Configuration
Server
Static IPs are best practice, though we’ll use a hostname in this config, so the main thing is that the server name netboot
resolves correctly.
HTTP
Lighttpd serves up from /var/www/http
so just drop an ISO there. For example, take a look at the current debian ISO (the numbering changes) at https://www.debian.org/CD/netinst and copy the link in like so:
sudo wget https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-12.6.0-amd64-netinst.iso -P /var/www/html -O debian.iso
DHCP
When configured in proxy dhcp mode: “…dnsmasq simply provides the information given in –pxe-prompt and –pxe-service to allow netbooting”. So only certain settings are available. This is a bit vague, but testing reveals that you must set the boot file name with the dhcp-boot
directive, rather than setting it with the more general DHCP option ID 67, for example.
sudo vi /etc/dnsmasq.d/netboot.conf
# Disable DNS
port=0
# Set for DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy
# Respond to clients that use 'HTTPClient' to identify themselves.
dhcp-pxe-vendor=HTTPClient
# Set the boot file name to the web server URL
dhcp-boot="http://netboot/debian.iso"
# PXE-service isn't actually used, but dnsmasq seems to need at least one entry to send the boot file name when in proxy mode.
pxe-service=x86-64_EFI,"Network Boot"
Client
Simply booting the client and selecting UEFI HTTP should be enough. The debian boot loader is signed and works with secure boot.
In addition to ISOs, you can also specify .efi
binaries like grubx64.efi
. Some distributions support this, though Debian itself may have issues.
Next Steps
You may want to support older clients by adding PXE Boot support.
Troubleshooting
dnsmasq
A good way to see what’s going on is to enable dnsmasq
logging.
# Add these to the dnsmasq config file
log-queries
log-dhcp
# Restart and follow to see what's happening
sudo systemctl restart dnsmasq.service
sudo systemctl -u dnsmasq -f
If you’ve enabled logging in dnsmasq
and it’s not seeing any requests, you may need to look at your networking. Some virtual environments suppress DHCP broadcasts when they are managing the IP range.
lighttpd
You can also see what’s being requested from the web server if you enable access logs.
cd /etc/lighttpd/conf-enabled
sudo ln -s ../conf-available/10-accesslog.conf
sudo systemctl restart lighttpd.service
sudo cat /var/log/lighttpd/access.log
5.1.2 - PXE Boot
Many older systems can’t HTTP Boot so let’s add PXE support with some dnsmasq
options.
Installation
Dnsmasq
Install as in the httpboot page.
The Debian Installer
Older clients don’t handle ISOs well, so grab and extract the Debian netboot files.
sudo wget http://ftp.debian.org/debian/dists/bookworm/main/installer-amd64/current/images/netboot/netboot.tar.gz -O - | sudo tar -xzvf - -C /var/www/html
Grub is famous for ignoring proxy dhcp settings, so let’s start off the boot with something else; iPXE
. It can do a lot, but isn’t signed so you must disable secure boot on your clients.
sudo wget https://boot.ipxe.org/ipxe.efi -P /var/www/html
Configuration
iPXE
Debian is ready to go, but you’ll want to create an auto-execute file for iPXE
so you don’t have to type in the commands manually.
sudo vi /var/www/html/autoexec.ipxe
#!ipxe
set base http://netboot/debian-installer/amd64
dhcp
kernel ${base}/linux
initrd ${base}/initrd.gz
boot
Dnsmasq
HTTP and PXE clients need different information to boot. We handle this by adding a filename to the PXE service option. This will override the dhcp-boot
directive for PXE clients.
sudo vi /etc/dnsmasq.d/netboot.conf
# Disable DNS
port=0
# Use in DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy
# Respond to both PXE and HTTP clients
dhcp-pxe-vendor=PXEClient,HTTPClient
# Send the BOOTP information for the clients using HTTP
dhcp-boot="http://netboot/debian.iso"
# Specify a boot menu option for PXE clients. If there is only one, it's booted immediately.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
# We also need to enable TFTP for the PXE clients
enable-tftp
tftp-root=/var/www/html
Client
Both types of client should now work. The debian installer will pull the rest of what it needs from the web.
Next Steps
You can create a boot-menu by adding multiple pxe-service
entries in dnsmasq
, or by customizing the iPXE autoexec.ipxe
files. Take a look at that in the menu page.
Troubleshooting
Text Flashes by, disappears, and client reboots
This is most often a symptom of secure boot still being enabled.
Legacy Clients
These configs are aimed at UEFI clients. If you have old BIOS clients, you can try the pxe-service
tag for those.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
pxe-service=x86PC,"iPXE (UEFI)", "ipxe.kpxe"
This may not work and there’s a few client flavors so enable the dnsmasq
logs to see how they identify themselves. You can also try booting pxelinux
as in the Debian docs.
DHCP Options
Dnsmasq also has a whole tag system that you can set and use similar to this:
dhcp-match=set:PXE-BOOT,option:client-arch,7
dhcp-option=tag:PXE-BOOT,option:bootfile-name,"netboot.xyz.efi"
However, dnsmasq
in proxy mode limits what you can send to the clients, so we’ve avoided DHCP options and focused on PXE service directives.
Debian Error
*ERROR* CPU pipe B FIFO underrun
You probably need to use the non-free firmware
No Boot option
Try entering the computers bios setup and adding a UEFI boot option for the OS you just installed. You may need to browse for the file \EFI\debian\grubx64.efi
Sources
https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html https://github.com/ipxe/ipxe/discussions/569 https://linuxhint.com/pxe_boot_ubuntu_server/#8
It’s possible to use secure boot if you’re willing to implement a chain of trust. Here’s an example used by FOG to boot devices.
https://forums.fogproject.org/topic/13832/secureboot-issues/3
5.1.3 - menu
It would be useful to have some choices when you netboot. You can use the pxe-service built into dnsmasq
but a more flexible option is the menu system provided by the iPXE project.
Installation
Set up a http/pxe net-boot server if you haven’t already.
Configuration
dnsmasq
Configure dnsmasq
to serve up the ipxe.efi
binary for both types of clients.
# Disable DNS
port=0
# Use in DHCP PXE Proxy mode
dhcp-range=192.168.0.0,proxy
# Tell dnsmasq to provide proxy PXE service to both PXE and HTTP clients
dhcp-pxe-vendor=PXEClient,HTTPClient
# Send the BOOTP information for the clients using HTTP
dhcp-boot="http://netboot/ipxe.efi"
# Specify a boot menu option for PXE clients. If there is only one, it's booted immediately.
pxe-service=x86-64_EFI,"iPXE (UEFI)", "ipxe.efi"
# We also need to enable TFTP for the PXE clients
enable-tftp
tftp-root=/var/www/html
Custom Menu
Change the autoexec.ipxe
to display a menu.
sudo vi /var/www/html/autoexec.ipxe
#!ipxe
echo ${cls}
:MAIN
menu Local Netboot Menu
item --gap Local Network Installation
item WINDOWS ${space} Windows 11 LTSC Installation
item DEBIAN ${space} Debian Installation
choose selection && goto ${selection} || goto ERROR
:WINDOWS
echo Some windows things here
sleep 3
goto MAIN
:DEBIAN
dhcp
imgfree
set base http://netboot/debian-installer/amd64
kernel ${base}/linux
initrd ${base}/initrd.gz
boot || goto ERROR
:ERROR
echo There was a problem with the selection. Exiting...
sleep 3
exit
Operation
You’ll doubtless find additional options to add. You may want to add the netboot.xyz project to your local menu too.
5.1.4 - netboot.xyz
You can add netboot.xyz
to your iPXE
menu to run Live CDs, OS installers and utilities they provide. This can save a lot of time and their list is always improving.
Installation
You’re going to connect to the web for this, so there’s nothing to install. You can download their efi bootloader manually if you’d like to keep things HTTPS, but they update it regularly so you may fall behind.
Configuration
Autoexec.ipxe
Add a menu item to your autoexec.ipxe
. When you select it, iPXE
will chainload (in their parlance) the netboot.xyz
bootloader.
#!ipxe
echo ${cls}
:MAIN
menu Local Netboot Menu
item --gap Local Network Installation
item WINDOWS ${space} Windows 11 LTSC Installation
item DEBIAN ${space} Debian Installation
item --gap Connect to Internet Sources
item NETBOOT ${space} Netboot.xyz
choose selection && goto ${selection} || goto ERROR
:WINDOWS
echo Some windows things here
sleep 3
goto MAIN
:DEBIAN
dhcp
imgfree
set base http://netboot/debian-installer/amd64
kernel ${base}/linux
initrd ${base}/initrd.gz
boot || goto ERROR
:NETBOOT
dhcp
chain --autofree http://boot.netboot.xyz || goto ERROR
:ERROR
echo There was a problem with the selection. Exiting...
sleep 3
exit
Local-vars
Netboot.xyz
detects that it’s working with a Proxy PXE server and behaves a little differently. For example, you can’t insert your own local menu.ipxe
. One helpful addition is a local settings file to speed up boot.
sudo vi /var/www/html/local-vars.ipxe
#!ipxe
set use_proxydhcp_settings true
Operation
You can choose the new menu item and load netboot.xyz
. It will take you out the web for more selections. Not everything will load on every client, of course. But it gives you a lot of options.
Next Steps
We glossed over how to install Windows. That’s a useful item.
Troubleshooting
Wrong TFTP Server
tftp://192.168.0.1/local-vars.ipxe....Connection timed out
Local vars file not found... attempting TFTP boot...
DHCP proxy detected, press p to boot from 192.168.0.2...
If your boot client is attempting to connect to the main DHCP server, that server is probably sending value next server: 192.168.0.1
in it’s packets. This isn’t a DNS option per say, but it affects netboot. Dnsmasq does this though Kea doesn’t.
sudo systemctl -u dnsmasq -f
...
...
next server: 192.168.0.1
...
...
The boot still works, it’s just annoying. You can usually ignore the message and don’t have to hit ‘p’.
Exec Format Error
Could not boot: Exec format error (https://ipxe.org/2e008081)
You may see this flash by. Check your menus and local variables file to make sure you’ve in included the #!pxe shebang.
No Internet
You can also host your own local instance.
5.1.5 - windows
To install windows, have iPXE load wimboot then WinPE. From there you can connect to a samba share and start the Windows installer. Just like back in the gold-ole administrative installation point days.
Getting a copy of WinPE the official way is a bit of a hurdle, but definitely less work than setting up a full Windows imaging solution.
Installation
Samba and Wimboot
On the netboot server, install wimboot and Samba.
sudo wget https://github.com/ipxe/wimboot/releases/latest/download/wimboot -P /var/www/html
sudo apt install samba
Window ADK
On a Windows workstation, download the ADK and PE Add-on and install as per Microsoft’s ADK Install Doc.
Configuration
Samba
Prepare the netboot server to receive the Windows files.
sudo vi /etc/samba/smb.conf
[global]
map to guest = bad user
log file = /var/log/samba/%m.log
[install]
path = /var/www/html
browseable = yes
read only = no
guest ok = yes
guest only = yes
sudo mkdir /var/www/html/winpe
sudo mkdir /var/www/html/win11
sudo chmod o+w /var/www/html/win*
sudo systemctl restart smbd.service
Window ADK
On the Windows workstation, start the deployment environment as an admin and create the working files as below. More info is in Microsoft’s Create Working Files document.
- Start -> All Apps -> Windows Kits -> Deployment and Imaging Tools Environment (Right Click, More, Run As Admin)
copype amd64 c:\winpe\amd64
Add the required additions for Windows 11 with the commands below. These are the optional components WinPE-WMI and WinPE-SecureStartup and more info is in Microsoft’s Customization Section.
mkdir c:\winpe\offline
dism /mount-Image /Imagefile:c:\winpe\amd64\media\sources\boot.wim /index:1 /mountdir:c:\winpe\offline
dism /image:c:\winpe\offline /add-package /packagepath:"..\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-WMI.cab" /packagepath:"..\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-SecureStartup.cab"
dism /unmount-image /mountdir:c:\winpe\offline /commit
Make the ISO in case you want to HTTP Boot from it later and keep the shell open for later.
MakeWinPEMedia /ISO C:\winpe\amd64 C:\winpe\winpe_amd64.iso
WinPE
Now that you’ve got a copy of WinPE, copy it to the netboot server.
net use q: \\netboot\install
xcopy /s c:\winpe\* q:\winpe
Also create some auto-start files for setup. The first is part to the WinPE system and tells it (generically) what to do after it starts up.
notepad q:\winpe\amd64\winpeshl.ini
[LaunchApps]
"install.bat"
This the second is more specific and associated with the thing you are installing. We’ll mix and match these in the PXE menu later so we can install different things.
notepad q:\win11\install.bat
wpeinit
net use \\netboot
\\netboot\install\win11\setup.exe
pause
Win 11
You also need to obtain the latest ISO and extract the contents.
- https://massgrave.dev/windows_ltsc_links
- Double-click on the ISO
- Copy contents to q:\win11
Wimboot
Bck on the netboot server, customize the WINDOWS section of your autoexex.ipxe like this.
:WINDOWS
dhcp
imgfree
set winpe http://netboot/winpe/amd64
set source http://netboot/win11
kernel wimboot
initrd ${winpe}/media/sources/boot.wim boot.wim
initrd ${winpe}/media/Boot/BCD BCD
initrd ${winpe}/media/Boot/boot.sdi boot.sdi
initrd ${winpe}/winpeshl.ini winpeshl.ini
initrd ${source}/install.bat install.bat
boot || goto MAIN
You can add other installs by copying this block and changing the :WINDOWS header and source variable.
Next Steps
Add some more installation sources and take a look at the Windows zero touch install.
Troubleshooting
System error 53 has occurred. The network path was not found
A given client may be unable to connect to the SMB service at all, or it may fail after connecting once. It’s possible that the the client does’t have an IP yet. It’s also possible that the server This seems to have something to do with timing and I haven’t found the cause but I suspect its security related. You can wait and it resolves itself.
You can also comment out the winpeshl.ini line and you’ll boot to a command prompt that will let you troubleshoot. Sometimes you just don’t have an IP yet from the DHCP server and you can edit the install.bat file to add a sleep or other things. See then [zero touch deployment] page for some more ideas.
Access is denied
This may be related to the executable bit. If you’ve copied from the ISO they should be set. But if after that you’ve changed anything you could have lost the x bit from setup.exe. It’s hard to know what’s supposed to be set once it’s gone, so you may want to recopy the files.
5.2 - Server Core
Installation Notes
If you’re deploying Windows servers, Server Core is best practice1. Install from USB and it will offer that as a choice - it’s fairly painless. But these instances are designed to be remote-managed so you’ll need to perform a few post-install tasks to help with that.
Server Post-Installation Tasks
Set a Manual IP Address
The IP is DHCP by default and that’s fine if you create a reservation at the DHCP server or just use DNS. If you require a manual address, however:
# Access the PowerShell interface (you can use the server console if desired)
# Identify the desired interface's index number. You'll see multiple per adapter for IP4 and 6 but the interface index will repeat.
Get-NetIPInterface
# Set a manual address, netmask and gateway using that index (12 in this example)
New-NetIPaddress -InterfaceIndex 12 -IPAddress 192.168.0.2 -PrefixLength 24 -DefaultGateway 192.168.0.1
# Set DNS
Set-DNSClientServerAddress –InterfaceIndex 12 -ServerAddresses 192.168.0.1
Allow Pings
This is normally a useful feature, though it depends on your security needs.
Set-NetFirewallRule -Name FPS-ICMP4-ERQ-In -Enabled True
Allow Computer Management
Server core allows ‘Remote Management’ by default2. That is specifically the Server Manager application that ships with Windows Server versions and is included with the Remote Server Admin Tools on Windows 10 professional3 or better. For more detailed work you’ll need to use the Computer Management feature as well. If you’re all part of AD, this is reported to Just Work(TM). If not, you’ll need to allow several ports for SMB and RPC.
# Port 445
Set-NetFirewallRule -Name FPS-SMB-In-TCP -Enabled True
# Port 135
Set-NetFirewallRule -Name WMI-RPCSS-In-TCP -Enabled True
maybe
FPS-NB_Name-In-UDP
NETDIS-LLMNR-In-UDP
Configuration
Remote Management Client
If you’re using windows 10/11, install it on a workstation by going to System -> Optional features -> View features
and enter Server Manager
in the search box to select and install.
With AD
When you’re all in the same Domain then everything just works (TM). Or so I’ve read.
Without AD
If you’re not using Active Directory, you’ll have to do a few extra steps before using the app.
Trust The Server
Tell your workstation you trust the remote server you are about to manage4 (yes, seems backwards). Use either the hostname or IP address depending on how your planning to connect - i.e. if you didn’t set up DNS use IPs. Start an admin powershell and enter:
Set-Item wsman:\localhost\Client\TrustedHosts 192.168.5.1 -Concatenate -Force
Add The Server
Start up Server Manager
and select Manage -> Add Servers -> DNS and search for the IP or DNS name. Pay attention the server’s name that it detects. If DNS happens to reslove the IP address you put in, as server-1.local
for example, you’ll need to repeat the above TrustedHosts command with that specific name.
Manage As…
You may notice that after adding the server, the app tries to connect and fails. You’ll need to right-click it and select Manage As… and enter credentials in the form of server-1\Administrator
and select Remember me to have this persist. Here you’ll need to use the actual server name and not the IP. If unsure, you can get this on the server with the hostname
command.
Starting Performance Counters
The server you added should now say that it’s performance counters are not started. Right-click to and you can select to start them. The server should now show up as Online and you can perform some basic tasks.
server-1.local\Administrator
Server Manager
is the default management tool and newer servers allow remote management by default. The client needs a few things, however.
- Set DNS so you can resolve by names
- Configure Trusted Hosts
On the system where you start the the Server Manager app - usually where you are sitting - ensure you can resolve the remote host via DNS. You may want to edit your hosts file if not.
notepad c:\Windows\System32\drivers\etc\hosts
You can now add the remote server.
Manage -> Add Servers -> DNS -> Search Box (enter the other servers hostname) -> Magnifying Glass -> Select the server -> Right Arrow Icon -> OK
(You man need to select Manage As on it)
Allow Computer Management
You can right-click on a remote server and select Computer Management after doing this
MISC
Set-NetFirewallProfile -Profile Domain, Public, Private -Enabled False
-
https://learn.microsoft.com/en-us/windows-server/get-started/install-options-server-core-desktop-experience ↩︎
-
https://learn.microsoft.com/en-us/windows-server/administration/server-core/server-core-sconfig#configure-remote-management ↩︎
-
https://www.microsoft.com/en-us/download/details.aspx?id=45520 ↩︎
-
https://learn.microsoft.com/en-us/windows-server/administration/server-manager/configure-remote-management-in-server-manager#to-enable-server-manager-remote-management-by-using-the-windows-interface ↩︎
5.3 - Virtualization
In the beginning, users time-shared CPUs and virtualization was without form and void. And IBM said “Let there be System/370”. This was in the 70’s and involved men with crew-cuts, horn-rimmed glasses and pocket protectors. And ties.
Today, you can still do full virtualization. Everything is emulated down to the hardware and every system has it’s own kernel and device drivers. Most of the public cloud started out this way at the dawn of the new millennium. It was the way. VMWare was the early player in this area and popularized it on x86 hardware where everyone was using 5% of their pizzabox servers.
The newer way is containerization. There is just one kernel and it keeps groups processes separate from each other. This is possible because Linux implemented kernel namespaces around 2008 - mostly work by IBM, suitably enough. The program used to work with this is named LXC and you’d use commands like sudo lxc-create --template download --name u1 --dist ubuntu --release jammy --arch amd64
. Other systems such as LXD and Docker (originally) are layed on top to provide more management.
Twenty some years later, what used to be a hot market is now a commodity that’s essentially given away for free. VMWare was acquired by Broadcom who’s focused on the value-extraction phase of it’s lifecycle and the cloud seems decidedly headed toward containers because of it’s better efficiency and agility.
5.3.1 - Incus
Inucs is a container manager, forked from Canonical’s LXD manager. It combines all the virtues of upstream LXD (containers + vms) with the advantages of community driven additions. You have access to the containers provided by the OCI (open container initiative) as well as being able to create VMs. It is used at the command line and includes a web interface.
Installation
Simply install a base OS on your server and add a few commands. You can install from your distro’s repo, but zabbly (the sponsor) is a bit newer.
As per https://github.com/zabbly/incus
sudo mkdir -p /etc/apt/keyrings/
sudo wget -O /etc/apt/keyrings/zabbly.asc https://pkgs.zabbly.com/key.asc
sudo sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc
EOF'
sudo apt update
sudo apt install -y incus incus-ui-canonical
Configuration
sudo adduser YOUR-USERNAME incus-admin
incus admin init
You’re fine to accept the defaults, though if you’re planning on a cluster consult
https://linuxcontainers.org/incus/docs/main/howto/cluster_form/#cluster-form
Managing Networks
Incus uses managed networks. It creates a private bridged network by default with DHCP, DNS and NAT services. You can create others and it add services similarly. You don’t plug instances in, rather you create a new profile with no network and configure the instance with that profile.
If you’re testing DHCP though, such as when working with netboot, you must create a network without those services. That must be done at the command line with the IP spaces set to none. You can then use that in a profile
incus network create test ipv4.address=none ipv6.address=none
incus profile copy default isolated
You can proceed to the GUI for the rest.
Operation
Windows 11 VM Creation
This requires access to the TPM module and an example at the command line is extracted from https://discussion.scottibyte.com/t/windows-11-incus-virtual-machine/362.
After repacking the installation ISO you can also create through the GUI and add:
incus config device add win11vm vtpm tpm path=/dev/tpm0
Agent
sudo apt install lxd-agent
Notes
LXD is widely admired, but Canonical’s decision to move it to in-house-only led the lead developer and elements of the community to fork.
5.4 - Zero Touch Install
The simplest way to zero-touch install Windows is with a web-generated answer file. Go to a site like schneegans and just create it. This removes the need for the complexity of MDS WDS SCCM etc. systems for normal deployments.
Create An Answer File
Visit schneegans, select the behavior you’d like and download the file. Use it one of the following ways;
USB
After creating the USB installer, copy the file (autounattend.xml) to the root of the USB drive (or one of these locations) and setup will automatically detect it.
Netboot
For a netboot install, copy the file to the sources folder of the Windows files.
scp autounattend.xml netboot:/var/www/html/win11/sources
Additionally, some scripting elements of the install don’t support UNC paths so we must map a drive. Back in the Windows netboot page, we created an install.bat to start the installation. Let’s modify that like so
vi /var/www/html/win11/install.bat
wpeinit
SET SERVER=netboot
:NET
net use q: \\%SERVER%\install
REM If there was a problem with the net use command,
REM ping, pause and loop back to try again
IF %ERRORLEVEL% NEQ 0 (
ping %SERVER%
pause
GOTO NET
) ELSE (
q:
cd win11
setup.exe
)
Add Packages
The installer can also add 3rd party software packages by adding commands in the Run custom scripts section. The system will need to be on-line to pull from the network so we’ll run them at the initial log-in. And since some versions of windows block anonymous SMB we’ll use HTTP.
Add Package Sources
On the netboot server, create an apps folder for your files and download packages there.
mkdir /var/www/html/apps; cd /var/www/html/apps
wget https://get.videolan.org/vlc/3.0.9.2/win64/vlc-3.0.9.2-win64.msi
wget https://statics.teams.cdn.office.net/production-windows-x64/enterprise/webview2/lkg/MSTeams-x64.msix
Add to Autounattend.xml
It’s easiest to add this in the web form rather than try and edit the XML file. Go to this section and add a line like this one to the third block of custom scripts. It must run at initial user login as the network isn’t available before that.
Navigate to the block that says:
Scripts to run when the first user logs on after Windows has been installed
For MSI Files
These and handled as .cmd files as in field 1.
msiexec /package http://netboot/apps/GoogleChromeStandaloneEnterprise64.msi /quiet
msiexec /package http://netboot/apps/vlc-3.0.9.2-win64.msi /quiet
For MSIX Files
These are handled as .ps1 files as in field 2.
Add-AppPackage -path http://netboot/apps/MSTeams-x64.msix
Notes
Windows Product Keys https://gist.github.com/rvrsh3ll/0810c6ed60e44cf7932e4fbae25880df
6 - Security
6.1 - CrowdSec
6.1.1 - Installation
Overview
CrowdSec has two main parts; detection and interdiction.
Detection is handled by the main CrowdSec binary. You tell it what files to keep an eye on, how to parse those files, and what something ‘bad’ looks like. It then keeps a list of IPs that have done bad things.
Interdiction is handled by any number of plugins called ‘bouncers’, so named because they block access or kick out bad IPs. They run independently and keep an eye on the list, to do things like edit the firewall to block access for a bad IP.
There is also the ‘crowd’ part. The CrowdSec binary downloads IPs of known bad-actors from the cloud for your bouncers to keep out and submits alerts from your systems.
Installation
With Debian, you can simply add the repo via their script and install with a couple lines.
curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
sudo apt install crowdsec
sudo apt install crowdsec-firewall-bouncer-nftables
This installs both the detection (crowdsec) and the interdiction (crowdsec-firewall-bouncer) parts. Assuming eveything went well, crowdsec will check in with the cloud, download a baseline list of known bad-actors, the firewall-bouncer will set up a basic drop list in the firewall, and crowdsec will start watching your syslog for intrusion attempts.
# Check out the very long drop list
sudo nft list ruleset | less
Configuration
CrowdSec comes pre-configured to watch for ssh brute-force attacks. If you have specific services to watch you can add those as described below.
Add a Service
You probably want to watch a specific service, like web server. Take a look at [https://hub.crowdsec.net/] to see all the available components. For example, browse the collections and search for caddy. The more info link will show you how to install the collection;
sudo cscli collections list -a
sudo cscli collections install crowdsecurity/caddy
Tell CrowdSec where Caddy’s log files are.
sudo tee -a /etc/crowdsec/acquis.yaml << EOF
---
filenames:
- /var/log/caddy/*.log
labels:
type: caddy
---
EOF
Restart crowdsec for these changes to take effect
sudo systemctl reload crowdsec
Operation
DataFlow
CrowdSec works by pulling in data from the Acquisition files, Parsing the events, comparing to Scenarios, and then Deciding if action should be taken.
Acquisition of data from log files is based on entries in the acquis.yaml
file, and the events given a label as defined in that file.
Those events feed the Parsers. There are a handful by default, but only the ones specifically interested in a given label will see it. They look for keywords like ‘FAILED LOGIN’ and then extract the IP.
Successfully parsed lines are feed to the Scenarios to if what happened matters. The scenarios look for things like 10 FAILED LOGINs in 1 min. This separates the accidental bad password entry from a brute force attempt.
Matching a scenario gets the IP added to the Decision List, i.e the list of bad IPs. These have a configurable expiration, so that if you really guess wrong 10 times in a row, you’re not banned forever.
The bouncers use this list to take action, like a firewall block, and will unblock you after the expiration.
Collections
Parsers and Scenarios work best when they work together so they are usually distributed together as a Collection. You can have collections of collections as well. For example, the base installation comes with the linux collection that includes a few parsers and the sshd collection.
To see what Collections, Parsers and Scenarios are running, use the cscli
command line interface.
sudo cscli collections list
sudo cscli collections inspect crowdsecurity/linux
sudo cscli collections inspect crowdsecurity/sshd
Inspecting the collection will tell you what parsers and scenarios it contains. As well as some metrics. To learn more a collection and it’s components, you can check out their page:
https://hub.crowdsec.net/author/crowdsecurity/collections/linux
The metrics are a bit confusing until you learn that the ‘Unparsed’ column doesn’t mean unparsed so much as it means a non-event. These are just normal logfile lines that don’t have one of the keywords the parser was looking for, like ‘LOGIN FAIL’.
Status
Is anyone currently attacking you? The decisions list shows you any current bad actors and the alerts list shows you a summary of past decisions. If you are just getting started this is probably none, but if you’re open to the internet this will grow quickly.
sudo cscli decisions list
sudo cscli alerts list
But you are getting events from the cloud and you can check those with the -a
option. You’ll notice that every 2 hours the community-blocklist is updated.
sudo cscli alerts list -a
After a while of this collection running, you’ll start to see these kinds of alerts
sudo cscli alerts list
╭────┬───────────────────┬───────────────────────────────────────────┬─────────┬────────────────────────┬───────────┬─────────────────────────────────────────╮
│ ID │ value │ reason │ country │ as │ decisions │ created_at │
├────┼───────────────────┼───────────────────────────────────────────┼─────────┼────────────────────────┼───────────┼─────────────────────────────────────────┤
│ 27 │ Ip:18.220.128.229 │ crowdsecurity/http-bad-user-agent │ US │ 16509 AMAZON-02 │ ban:1 │ 2023-03-02 13:12:27.948429492 +0000 UTC │
│ 26 │ Ip:18.220.128.229 │ crowdsecurity/http-path-traversal-probing │ US │ 16509 AMAZON-02 │ ban:1 │ 2023-03-02 13:12:27.979479713 +0000 UTC │
│ 25 │ Ip:18.220.128.229 │ crowdsecurity/http-probing │ US │ 16509 AMAZON-02 │ ban:1 │ 2023-03-02 13:12:27.9460075 +0000 UTC │
│ 24 │ Ip:18.220.128.229 │ crowdsecurity/http-sensitive-files │ US │ 16509 AMAZON-02 │ ban:1 │ 2023-03-02 13:12:27.945759433 +0000 UTC │
│ 16 │ Ip:159.223.78.147 │ crowdsecurity/http-probing │ SG │ 14061 DIGITALOCEAN-ASN │ ban:1 │ 2023-03-01 23:03:06.818512212 +0000 UTC │
│ 15 │ Ip:159.223.78.147 │ crowdsecurity/http-sensitive-files │ SG │ 14061 DIGITALOCEAN-ASN │ ban:1 │ 2023-03-01 23:03:05.814690037 +0000 UTC │
╰────┴───────────────────┴───────────────────────────────────────────┴─────────┴────────────────────────┴───────────┴─────────────────────────────────────────╯
You may even need to unblock yourself
sudo cscli decisions list
sudo cscli decision delete --id XXXXXXX
Next Steps
You’re now taking advantage of the crowd-part of the crowdsec and added your own service. If you don’t have any alerts though, you may be wondering how well it’s actually working.
Take a look at the detailed activity if you want to look more closely at what’s going on.
6.1.2 - Detailed Activity
Inspecting Metrics
Data comes in through the parsers. To see what they are doing, let’s take a look at the Acquisition and Parser metrics.
sudo cscli metrics
Most of the ‘Acquisition Metrics’ lines will be read and unparsed. This is because normal events are dropped. It only considers lines parsed if they were passed on to a scenario. The ‘bucket’ column refers to event scenarios and is also blank as there were no parsed lines to hand off.
Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│ Source │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 216 │ - │ 216 │ - │
│ file:/var/log/syslog │ 143 │ - │ 143 │ - │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯
The ‘Parser Metrics’ will show the individual parsers - but not all of them. Only parsers that have at least one ‘hit’ are shown. In this example, only the syslog parser shows up. It’s a low-level parser that doesn’t look for matches, so every line is a hit.
Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│ Parsers │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/syslog-logs │ 359 │ 359 │ - │
│ crowdsecurity/syslog-logs │ 359 │ 359 │ - │
╰─────────────────────────────────┴──────┴────────┴──────────╯
However, try a couple failed SSH login attemps and you’ll see them and how they feed up the the Acquistion Metrics.
Acquisition Metrics:
╭────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│ Source │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 242 │ 3 │ 239 │ - │
│ file:/var/log/syslog │ 195 │ - │ 195 │ - │
╰────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯
Parser Metrics:
╭─────────────────────────────────┬──────┬────────┬──────────╮
│ Parsers │ Hits │ Parsed │ Unparsed │
├─────────────────────────────────┼──────┼────────┼──────────┤
│ child-crowdsecurity/sshd-logs │ 61 │ 3 │ 58 │
│ child-crowdsecurity/syslog-logs │ 442 │ 442 │ - │
│ crowdsecurity/dateparse-enrich │ 3 │ 3 │ - │
│ crowdsecurity/geoip-enrich │ 3 │ 3 │ - │
│ crowdsecurity/sshd-logs │ 8 │ 3 │ 5 │
│ crowdsecurity/syslog-logs │ 442 │ 442 │ - │
│ crowdsecurity/whitelists │ 3 │ 3 │ - │
╰─────────────────────────────────┴──────┴────────┴──────────╯
Lines poured to bucket however, is still empty. That means the scenaros decided it wasn’t a hack attempt. With SSH timeouts it actually hard to do without a tool. Plus, you may notice the ‘whitelist` was triggered. Private IP ranges are whilelisted by default so you can’t lock yourself out from inside.
Let’s ask crowdsec to explain what’s going on
Detailed Parsing
To see which parsers got involved and what they did, you can ask.
sudo cscli explain --file /var/log/auth.log --type syslog
Here’s a ssh example of a failed login. The numbers, such as (+9 ~1), mean that the parser added 9 elements it parsed from the raw event, and updated 1. Notice the whitelists parser at the end. It’s catching this event and dropping it, hense the ‘parser failure’
line: Mar 1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16 user=allen
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs (first_parser)
├ s01-parse
| └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
├ s02-enrich
| ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
| ├ 🟢 crowdsecurity/geoip-enrich (+9)
| └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
└-------- parser failure 🔴
Why exactly did it get whitelisted? Let’s ask for a verbose report.
sudo cscli explain -v --file /var/log/auth.log --type syslog
line: Mar 1 14:08:11 www sshd[199701]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.1.16 user=someGuy
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs (first_parser)
├ s01-parse
| └ 🟢 crowdsecurity/sshd-logs (+9 ~1)
| └ update evt.Stage : s01-parse -> s02-enrich
| └ create evt.Parsed.sshd_client_ip : 192.168.1.16
| └ create evt.Parsed.uid : 0
| └ create evt.Parsed.euid : 0
| └ create evt.Parsed.pam_type : unix
| └ create evt.Parsed.sshd_invalid_user : someGuy
| └ create evt.Meta.service : ssh
| └ create evt.Meta.source_ip : 192.168.1.16
| └ create evt.Meta.target_user : someGuy
| └ create evt.Meta.log_type : ssh_failed-auth
├ s02-enrich
| ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
| ├ create evt.Enriched.MarshaledTime : 2023-03-01T14:08:11Z
| ├ update evt.MarshaledTime : -> 2023-03-01T14:08:11Z
| ├ create evt.Meta.timestamp : 2023-03-01T14:08:11Z
| ├ 🟢 crowdsecurity/geoip-enrich (+9)
| ├ create evt.Enriched.Longitude : 0.000000
| ├ create evt.Enriched.ASNNumber : 0
| ├ create evt.Enriched.ASNOrg :
| ├ create evt.Enriched.ASNumber : 0
| ├ create evt.Enriched.IsInEU : false
| ├ create evt.Enriched.IsoCode :
| ├ create evt.Enriched.Latitude : 0.000000
| ├ create evt.Meta.IsInEU : false
| ├ create evt.Meta.ASNNumber : 0
| └ 🟢 crowdsecurity/whitelists (~2 [whitelisted])
| └ update evt.Whitelisted : %!s(bool=false) -> true
| └ update evt.WhitelistReason : -> private ipv4/ipv6 ip/ranges
└-------- parser failure 🔴
This shows the actual data and at the bottom, parser crowdsecurity/whitelists has updated the property ’evt.Whitelisted’ to true and gave it a reason. That property appears to be a built-in that flags events to be dropped.
If you want to change the ranges, you can edit the logic by editing the yaml file. A sudo cscli hub list
will show you what file that is. Add or remove entries from the list it’s checking the ‘ip’ valie and ‘cidr’ value against. Any match cases whitelist to become true.
False Positives
You may see a high percent of ‘Lines poured to bucket’ relative to ‘Lines read’, like in this example where almost all are. Some lines triggering two scenareos when the ‘bucket’ is greater than the number of ‘parsed’
Acquisition Metrics:
╭────────────────────────────────┬────────────┬──────────────┬────────────────┬────────────────────────╮
│ Source │ Lines read │ Lines parsed │ Lines unparsed │ Lines poured to bucket │
├────────────────────────────────┼────────────┼──────────────┼────────────────┼────────────────────────┤
│ file:/var/log/auth.log │ 69 │ - │ 69 │ - │
│ file:/var/log/caddy/access.log │ 21 │ 21 │ - │ 32 │
│ file:/var/log/syslog │ 2 │ - │ 2 │ - │
╰────────────────────────────────┴────────────┴──────────────┴────────────────┴────────────────────────╯
Sometimes, that’s OK as not all scenarios are designed to take instant action. The ‘http-crawl-non_statics’ had 17 events and was considering action against 2 IPs, but never ‘Overflowed’ aka took action.
The http-probing did, however. And one of the two IPs had action take against them
Bucket Metrics:
╭──────────────────────────────────────┬───────────────┬───────────┬──────────────┬────────┬─────────╮
│ Bucket │ Current Count │ Overflows │ Instantiated │ Poured │ Expired │
├──────────────────────────────────────┼───────────────┼───────────┼──────────────┼────────┼─────────┤
│ crowdsecurity/http-crawl-non_statics │ - │ - │ 2 │ 17 │ 2 │
│ crowdsecurity/http-probing │ - │ 1 │ 2 │ 15 │ 1 │
╰──────────────────────────────────────┴───────────────┴───────────┴──────────────┴────────┴─────────╯
You can ask crowdsec to explain what’s going on with a -v
and see that clients are asking for things that don’t exist.
├ s00-raw
| ├ 🟢 crowdsecurity/non-syslog (first_parser)
| └ 🔴 crowdsecurity/syslog-logs
├ s01-parse
| └ 🟢 crowdsecurity/caddy-logs (+19 ~2)
| └ update evt.Stage : s01-parse -> s02-enrich
| └ create evt.Parsed.request : /0/icon/Forman,%20M.L.%20
| ...
| └ create evt.Meta.http_status : 404
| ...
├-------- parser success 🟢
├ Scenarios
├ 🟢 crowdsecurity/http-crawl-non_statics
└ 🟢 crowdsecurity/http-probing
If you look at the rules (sudo cscli hub list
) for http-probing, you’ll see it looks for 404s (file not found). If you get more than 10 in 10 seconds, it ‘overflows’ and the IP get baned.
Whitelist
The trouble is, some web apps generate a lot of 404s as they try and load page elements in case they exist. This generates lots of 404s and bans. In this case, we must whitelist the application with an expression that checks to see if it was an icon request, like above.
sudo vi /etc/crowdsec/parsers/s02-enrich/some-app-whitelist.yaml
name: crowdsecurity/whitelists
description: "Whitelist 404s for icon requests"
whitelist:
reason: "icon request"
expression:
- evt.Parsed.request startsWith '/0/icon/'
6.1.3 - Custom Parser
When checking out the detailed metrics you may find that log entries aren’t being parsed. Maybe the log format has changed or you’re logging additional data the author didn’t anticipate. The best thing is to add your own parser.
Types of Parsers
There are several type of parsers and they are used in stages. Some are designed to work with the raw log entries while others are designed to take pre-parsed data and add or enrich it. This way you can do branching and not every parser needs to now how to read a syslog message.
Their Local Path will tell you what stage they kick in at. Use sudo cscli parsers list
to display the details. s00-raw works with the ‘raw’ files while s01 and s02 work further down the pipeline. Currently, you can only create s00 and s01 level parsers.
Integrating with Scenarios
Useful parsers supply data that Scenarios are interested in. You can create a parser that watches the system logs for ‘FOOBAR’ entries, extracts the ‘FOOBAR-LEVEL`, and passes it on. But if nothing is looking for ‘FOOBARs’ then nothing will happen.
Let’s say you’ve added the Caddy collection. It’s pulled in a bunch of Scenarios you can view with sudo cscli scenarios list
. If you look at one of the assicated files you’ll see a filter section where they look for ’evt.Meta.http_path’ and ’evt.Parsed.verb’. They are all different though, so how do you know what data to supply?
Your best bet is to take an existing parser and modify it.
Examples
Note - CrowdSec is pretty awesome and after talking in the discord they’ve already accomodated both these scenarios within a relase cycle or two. So these two examples are solved. I’m sure you’ll find new ones, though ;-)
A Web Example
Let’s say that you’ve installed the Caddy collection, but you’ve noticed basic auth login failures don’t trigger the parser. So let’s add a new file and edit it.
sudo cp /etc/crowdsec/parsers/s01-parse/caddy-logs.yaml /etc/crowdsec/parsers/s01-parse/caddy-logs-custom.yaml
You’ll notice two top level sections where the parsing happens; nodes and statics and some grok pattern matching going on.
Nodes allow you try multiple patterns and if any match, the whole section is considered successful. I.e. if the log could have either the standard HTTPDATE or a CUSTOMDATE, as long as it has one it’s good and the matching can move on. Statics just goes down the list extracting data. If any fail the whole event is considered a fail and dropped as unparseable.
All the pasrsed data gets attached to event as ’evt.Parsed.something’ and some of the statics are moving it to evt values the Senarios will be looking for Caddy logs are JSON formatted and so basically already parsed and this example makes use of the JsonExtract method quite a bit.
# We added the caddy logs in the acquis.yaml file with the label 'caddy' and so we use that as our filter here
filter: "evt.Parsed.program startsWith 'caddy'"
onsuccess: next_stage
# debug: true
name: caddy-logs-custom
description: "Parse custom caddy logs"
pattern_syntax:
CUSTOMDATE: '%{DAY:day}, %{MONTHDAY:monthday} %{MONTH:month} %{YEAR:year} %{TIME:time} %{WORD:tz}'
nodes:
- nodes:
- grok:
pattern: '%{NOTSPACE} %{NOTSPACE} %{NOTSPACE} \[%{HTTPDATE:timestamp}\]%{DATA}'
expression: JsonExtract(evt.Line.Raw, "common_log")
statics:
- target: evt.StrTime
expression: evt.Parsed.timestamp
- grok:
pattern: "%{CUSTOMDATE:timestamp}"
expression: JsonExtract(evt.Line.Raw, "resp_headers.Date[0]")
statics:
- target: evt.StrTime
expression: evt.Parsed.day + " " + evt.Parsed.month + " " + evt.Parsed.monthday + " " + evt.Parsed.time + ".000000" + " " + evt.Parsed.year
- grok:
pattern: '%{IPORHOST:remote_addr}:%{NUMBER}'
expression: JsonExtract(evt.Line.Raw, "request.remote_addr")
- grok:
pattern: '%{IPORHOST:remote_ip}'
expression: JsonExtract(evt.Line.Raw, "request.remote_ip")
- grok:
pattern: '\["%{NOTDQUOTE:http_user_agent}\"]'
expression: JsonExtract(evt.Line.Raw, "request.headers.User-Agent")
statics:
- meta: log_type
value: http_access-log
- meta: service
value: http
- meta: source_ip
expression: evt.Parsed.remote_addr
- meta: source_ip
expression: evt.Parsed.remote_ip
- meta: http_status
expression: JsonExtract(evt.Line.Raw, "status")
- meta: http_path
expression: JsonExtract(evt.Line.Raw, "request.uri")
- target: evt.Parsed.request #Add for http-logs enricher
expression: JsonExtract(evt.Line.Raw, "request.uri")
- parsed: verb
expression: JsonExtract(evt.Line.Raw, "request.method")
- meta: http_verb
expression: JsonExtract(evt.Line.Raw, "request.method")
- meta: http_user_agent
expression: evt.Parsed.http_user_agent
- meta: target_fqdn
expression: JsonExtract(evt.Line.Raw, "request.host")
- meta: sub_type
expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"
The very last line is where a status 401 is checked. It looks for a 401 and a request for Basic auth. However, this misses events where someone asks for a resource that is protected and the serer responds telling you Basic is needed. I.e. when a bot is poking at URLs on your server ignoring the prompts to login. You can look at the log entries more easily with this command to follow the log and decode it while you recreate failed attempts.
sudo tail -f /var/log/caddy/access.log | jq
To change this, update the expression to also check the response header with an additional ? (or) condition.
expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"xtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'resp_headers.Www-Authenticate[0]') startsWith 'Basic ' ? 'auth_fail' : ''"
Syslog Example
Let’s say you’re using dropbear and failed logins are not being picked up by the ssh parser
To see what’s going on, you use the crowdsec command line interface. The shell command is cscli
and you can ask it about it’s metrics to see how many lines it’s parsed and if any of them are suspicious. Since we just restarted, you may not have any syslog lines yet, so let’s add some and check.
ssh [email protected]
logger "This is an innocuous message"
cscli metrics
INFO[28-06-2022 02:41:33 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
| SOURCE | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 1 | - | 1 | - |
+------------------------+------------+--------------+----------------+------------------------+
Notice that the line we just read is unparsed and that’s OK. That just means it wasn’t an entry the parser cared about. Let’s see if it responds to an actual failed login.
dbclient some.remote.host
# Enter some bad passwords and then exit with a Ctrl-C. Remember, localhost attempts are whitelisted so you must be remote.
[email protected]'s password:
[email protected]'s password:
cscli metrics
INFO[28-06-2022 02:49:51 PM] Acquisition Metrics:
+------------------------+------------+--------------+----------------+------------------------+
| SOURCE | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
+------------------------+------------+--------------+----------------+------------------------+
| file:/var/log/messages | 7 | - | 7 | - |
+------------------------+------------+--------------+----------------+------------------------+
Well, no luck. We will need to adjust the parser
sudo cp /etc/crowdsec/parsers/s01-parse/sshd-logs.yaml /etc/crowdsec/parsers/s01-parse/sshd-logs-custom.yaml
Take a look at the logfile and copy an example line over to https://grokdebugger.com/. Use a pattern like
Bad PAM password attempt for '%{DATA:user}' from %{IP:source_ip}:%{INT:port}
Assuming you get the pattern worked out, you can then add a section to the bottom of the custom log file you created.
- grok:
name: "SSHD_AUTH_FAIL"
pattern: "Login attempt for nonexistent user from %{IP:source_ip}:%{INT:port}"
apply_on: message
6.1.4 - On Alpine
Install
There are some packages available, but (as of 2022) they are a bit behind and don’t include the config and service files. So let’s download the latest binaries from Crowsec and create our own.
Download the current release
Note: Download the static versions. Alpine uses a differnt libc than other distros.
cd /tmp
wget https://github.com/crowdsecurity/crowdsec/releases/latest/download/crowdsec-release-static.tgz
wget https://github.com/crowdsecurity/cs-firewall-bouncer/releases/latest/download/crowdsec-firewall-bouncer.tgz
tar xzf crowdsec-firewall*
tar xzf crowdsec-release*
rm *.tgz
Install Crowdsec and Register with The Central API
You cannot use the wizard as it expects systemd and doesn’t support OpenRC. Follow the Binary Install steps from CrowdSec’s binary instrcutions.
sudo apk add bash newt envsubst
cd /tmp/crowdsec-v*
# Docker mode skips configuring systemd
sudo ./wizard.sh --docker-mode
sudo cscli hub update
sudo cscli machines add -a
sudo cscli capi register
# A collection is just a bunch of parsers and scenarios bundled together for convienence
sudo cscli collections install crowdsecurity/linux
Install The Firewall Bouncer
We need a netfilter tool so install nftables. If you already have iptables installed you can skip this step and set FW_BACKEND to that below when generating the API keys.
sudo apk add nftables
Now we install the firewall bouncer. There is no static build of the firewall bouncer yet from CrowdSec, but you can get one from Alpine testing (if you don’t want to compile it yourself)
# Change from 'edge' to other versions a needed
echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
apk update
apk add cs-firewall-bouncer
Now configure the bouncer. We will once again do this manually becase there is not support for non-systemd linuxes with the install script. But cribbing from their install script, we see we can:
cd /tmp/crowdsec-firewall*
BIN_PATH_INSTALLED="/usr/local/bin/crowdsec-firewall-bouncer"
BIN_PATH="./crowdsec-firewall-bouncer"
sudo install -v -m 755 -D "${BIN_PATH}" "${BIN_PATH_INSTALLED}"
CONFIG_DIR="/etc/crowdsec/bouncers/"
sudo mkdir -p "${CONFIG_DIR}"
sudo install -m 0600 "./config/crowdsec-firewall-bouncer.yaml" "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"
Generate The API Keys
Note: If you used the APK, just do the first two lines to get the API_KEY (echo $API_KEY) and manually edit the file (vim /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml)
cd /tmp/crowdsec-firewall*
CONFIG_DIR="/etc/crowdsec/bouncers/"
SUFFIX=`tr -dc A-Za-z0-9 </dev/urandom | head -c 8`
API_KEY=`sudo cscli bouncers add cs-firewall-bouncer-${SUFFIX} -o raw`
FW_BACKEND="nftables"
API_KEY=${API_KEY} BACKEND=${FW_BACKEND} envsubst < ./config/crowdsec-firewall-bouncer.yaml | sudo install -m 0600 /dev/stdin "${CONFIG_DIR}crowdsec-firewall-bouncer.yaml"
Create RC Service Files
sudo touch /etc/init.d/crowdsec
sudo chmod +x /etc/init.d/crowdsec
sudo rc-update add crowdsec
sudo vim /etc/init.d/crowdsec
#!/sbin/openrc-run
command=/usr/local/bin/crowdsec
command_background=true
pidfile="/run/${RC_SVCNAME}.pid"
depend() {
need localmount
need net
}
Note: If you used the package from Alpine testing above it came with a service file. Just rc-update add cs-firewall-bouncer
and skip this next step.
sudo touch /etc/init.d/cs-firewall-bouncer
sudo chmod +x /etc/init.d/cs-firewall-bouncer
sudo rc-update add cs-firewall-bouncer
sudo vim /etc/init.d/cs-firewall-bouncer
#!/sbin/openrc-run
command=/usr/local/bin/crowdsec-firewall-bouncer
command_args="-c /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml"
pidfile="/run/${RC_SVCNAME}.pid"
command_background=true
depend() {
after firewall
}
Start The Services and Observe The Results
Start up the services and view the logs to see that everything started properly
sudo service start crowdsec
sudo service cs-firewall-bouncer status
sudo tail /var/log/crowdsec.log
sudo tail /var/log/crowdsec-firewall-bouncer.log
# The firewall bouncer should tell you about how it's inserting decisions it got from the hub
sudo cat /var/log/crowdsec-firewall-bouncer.log
time="28-06-2022 13:10:05" level=info msg="backend type : nftables"
time="28-06-2022 13:10:05" level=info msg="nftables initiated"
time="28-06-2022 13:10:05" level=info msg="Processing new and deleted decisions . . ."
time="28-06-2022 14:35:35" level=info msg="100 decisions added"
time="28-06-2022 14:35:45" level=info msg="1150 decisions added"
...
...
# If you are curious about what it's blocking
sudo nft list table crowdsec
...
7 - Storage
7.1 - Seafile
TODO - seafile 11 is in beta and mysql is required.
Seafile is a cloud storage system, similar to google drive. It stands out for being simpler and faster than it’s peers. It’s also open source.
Preparation
You’ll need a linux server. We use Debian 12 in this example and instructions are based on Seafile’s SQLite instructions, updated for the new OS.
cffi build issues[^cffi],
and a python virtual environement so apt and pip packages play nice.
# The main requirements
sudo apt install -y memcached libmemcached-dev pwgen sqlite3
sudo systemctl enable --now memcached
# Python specific things
sudo apt install -y python3 python3-setuptools python3-pip
sudo apt install python3-wheel python3-django python3-django-captcha python3-future python3-willow python3-pylibmc python3-jinja2 python3-psd-tools python3-pycryptodome python3-cffi
# cffi build requirements
sudo apt install -y build-essential libssl-dev libffi-dev python-dev-is-python3
# Install the service account and create a python virtual environment for them
sudo apt install python3-venv
sudo useradd --home-dir /opt/seafile --system --comment "Seafile Service Account" --create-home seafile
sudo -i -u seafile
python3 -m venv .venv
source .venv/bin/activate
# Install the rest of the packages from pip
pip3 install --timeout=3600 \
wheel django django-pylibmc django-simple-captcha future \
Pillow pylibmc captcha jinja2 psd-tools pycryptodome cffi
192.168.1.21:/srv/seafile /srv/seafile nfs defaults,noatime,vers=4.1 0 0
Installation
It comes with two services. Seafile, the file sync server, and Seahub, a web interface and editor.
For a small team, you can install a lightweight instance of Seafile using a single host and sqlite.
Note: There is a seafile repo, but it may be [client] only. TODO test this
As per the install [instructions] this will create several folders in seafile’s home directory and a symlink to the binaries in a version specific directory for easy upgrades.
# Contine as the seafile user - the python venv should still be in effect. If not, source as before
# Downlaod and exract the binary
wget -P /tmp https://s3.eu-central-1.amazonaws.com/download.seadrive.org/seafile-server_10.0.1_x86-64.tar.gz
tar -xzf /tmp/seafile-server_10.0.1_x86-64.tar.gz -C /opt/seafile/
rm /tmp/seafile*
# Run the setup script
cd /opt/seafile/sea*
./setup-seafile.sh
# Start seafile and seahub to answer some setup questions
./seafile.sh start
./seahub.sh start
./seahub.sh stop
./seafile.sh stop
Create systemd service files1 for the two services. (as a sudo capable user)
sudo tee /etc/systemd/system/seafile.service << EOF
[Unit]
Description=Seafile
After=network.target
[Service]
Type=forking
ExecStart=/opt/seafile/seafile-server-latest/seafile.sh start
ExecStop=/opt/seafile/seafile-server-latest/seafile.sh stop
LimitNOFILE=infinity
User=seafile
Group=seafile
[Install]
WantedBy=multi-user.target
EOF
Note: The ExecStart below is a bit cumbersome, but it saves modifying the vendor’s start script. Only the Seahub service seems to need the virtual env, though you can give both services the same treatment if you wish.
sudo tee /etc/systemd/system/seahub.service << EOF
[Unit]
Description=Seafile hub
After=network.target seafile.service
[Service]
Type=forking
ExecStart=/bin/bash -c 'source /opt/seafile/.venv/bin/activate && /opt/seafile/seafile-server-latest/seahub.sh start'
ExecStop=/bin/bash -c 'source /opt/seafile/.venv/bin/activate && /opt/seafile/seafile-server-latest/seahub.sh stop'
User=seafile
Group=seafile
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now seafile.service
sudo systemctl enable --now seahub.service
Seafile and Seahub should have started without error, though by default you can only access it from locahost.
If you run into problems here make sure to start Seafile first. Expiriment with sourcing the activation file as the seafile user and running the start script directly.
Add logrotation
sudo tee /etc/logrotate.d/seafile << EOF
/opt/seafile/logs/seafile.log
/opt/seafile/logs/seahub.log
/opt/seafile/logs/file_updates_sender.log
/opt/seafile/logs/repo_old_file_auto_del_scan.log
/opt/seafile/logs/seahub_email_sender.log
/opt/seafile/logs/work_weixin_notice_sender.log
/opt/seafile/logs/index.log
/opt/seafile/logs/content_scan.log
/opt/seafile/logs/fileserver-access.log
/opt/seafile/logs/fileserver-error.log
/opt/seafile/logs/fileserver.log
{
daily
missingok
rotate 7
# compress
# delaycompress
dateext
dateformat .%Y-%m-%d
notifempty
# create 644 root root
sharedscripts
postrotate
if [ -f /opt/seafile/pids/seaf-server.pid ]; then
kill -USR1 `cat /opt/seafile/pids/seaf-server.pid`
fi
if [ -f /opt/seafile/pids/fileserver.pid ]; then
kill -USR1 `cat /opt/seafile/pids/fileserver.pid`
fi
if [ -f /opt/seafile/pids/seahub.pid ]; then
kill -HUP `cat /opt/seafile/pids/seahub.pid`
fi
find /opt/seafile/logs/ -mtime +7 -name "*.log*" -exec rm -f {} \;
endscript
}
EOF
Configuration
Seahub (the web UI) by default is bound to localhost only. Change that to all addresses so you can access it from other systems.
sudo sed -i 's/^bind.*/bind = "0.0.0.0:8000"/' /opt/seafile/conf/gunicorn.conf.py
If you’re not proxying already, check the seahub settings. You may need to add the correct internal name and port for ititial access. You should add the file server root as well so you don’t have to add it in the GUI later.
vi /opt/seafile/conf/seahub_settings.py
SERVICE_URL = "http://seafile.some.lan:8000/"
FILE_SERVER_ROOT = "http://seafile.some.lan:8082"
Add a connection to the memcache server
sudo tee -a /opt/seafile/conf/seahub_settings.py << EOF
CACHES = {
'default': {
'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',
'LOCATION': '127.0.0.1:11211',
},
}
EOF
And restart to take affect
sudo systemctl restart seahub
You should now be able to login at http://some.server:8000/ with the credentials you created during the command line setup. If the web GUI works, but you can’t download files or the markdown editor doesn’t work as expected, check the FILE_SERVER_ROOT and look in the GUI’s System Admin section at those settings.
NFS Mount
Large amounts of data are best handled by a dedicated storage system and those are usually mounted over the network via NFS or a similar protocol. Seafile data should be stored in such a system, but you cannot mount the entire Seafile data folder over the network as it includes SQLite data that recommends2 against that. Nor can you mount each subdirectory seperately as they rely upon internal links that must be on the same filesystem.
The solution is to mount a network share in an alternate location and symlink the relative parts of the Seafile data directory to it.
sudo mount nfs.server:/exports/seafile /mnt/seafile
sudo systemctl stop seahub
sudo systemctl stop seafile
sudo mv /opt/seafile/seafile-data/httptemp \
/opt/seafile/seafile-data/storage \
/opt/seafile/seafile-data/tmpfiles \
/mnt/seafile/
sudo ln -s /mnt/seafile/httptemp /opt/seafile/seafile-data/
sudo ln -s /mnt/seafile/storage /opt/seafile/seafile-data/
sudo ln -s /mnt/seafile/tmpfiles /opt/seafile/seafile-data/
sudo chown -R seafile:seafile /mnt/seafile
Proxy
Say something about why caddy, then give the proxy file, then say HTTP/3 and enabling UDP 443 and seeing it in the logs. with firefox enabled. No special server config.
https://caddy.community/t/caddy-v2-and-seafile-server-on-a-root-server/9188/2
Note the change in the GUI for the 8082
https://www.seafile.com/en/download/#server
-
https://manual.seafile.com/deploy/start_seafile_at_system_bootup/ ↩︎
-
https://www.sqlite.org/faq.html#q5 [client]:https://help.seafile.com/syncing_client/install_linux_client/ [instructions]:https://manual.seafile.com/deploy/using_sqlite/ ↩︎
7.2 - TrueNAS
7.2.1 - Disk Replacement
Locate the failed drive.
zpool status
It will show something like
NAME STATE READ WRITE CKSUM
pool01 DEGRADED 0 0 0
raidz3-0 ONLINE 0 0 0
44fca0d1-f343-48e6-9a43-c71463551aa4 ONLINE 0 0 0
7ca5e989-51a5-4f1b-a81e-982d9a05ac04 ONLINE 0 0 0
8fd249a0-c8c6-47bb-8787-3e246300c62d ONLINE 0 0 0
573c1117-27d4-430c-b57c-858a75b4ca35 ONLINE 0 0 0
29b7c608-72ae-4ec2-830b-0e23925ac0b1 ONLINE 0 0 0
293acdbe-6be5-4fa7-945a-e9481b09c0fa ONLINE 0 0 0
437bac45-433b-48e3-bc70-ae1c82e8155b ONLINE 0 0 0
a5ca09a7-3f3f-4135-a2d9-71290fd79160 ONLINE 3 2 0
raidz3-1 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
65f61699-e2fc-4a36-86dd-b0fa6a774798 FAULTED 53 0 0 too many errors
9d794dfd-2ef6-432d-8252-0c93e79509dc ONLINE 0 0 0
e27f31e8-a1a4-47dc-ac01-4a6c99b6e5d0 ONLINE 0 0 0
aff60721-21ae-42bf-b077-1937aeafaab2 ONLINE 0 0 0
714da3e5-ca9c-43d0-a0f3-c0fa693a5b02 ONLINE 0 0 0
df89869a-4445-47f9-afa9-3b9cce3b1530 ONLINE 0 0 0
29748037-bbd5-4f2d-8878-4fa2b81d9ec3 ONLINE 0 0 0
1ff396ec-dec7-45dd-9172-de31e5f6fca7 ONLINE 0 0 0
Off-line the drive.
zpool offline pool01 65f61699-e2fc-4a36-86dd-b0fa6a77479
Get the serial number
hdparm -I /dev/disk/by-partuuid/65f61699-e2fc-4a36-86dd-b0fa6a774798 | grep Serial
The output will be something like
Serial Number: ZC1168HE
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Identify the bay location
sas3ircu 0 display | grep -B 10 ZC1168HE
The output will look like
Device is a Hard disk
Enclosure # : 2
Slot # : 17
Turn on the bay indicator
sas3ircu 0 locate 2:17 ON
Physically replace the disk
Check the logs for the new disk’s name
dmesg
The output will indicate the device id, such as ‘sdal’ in the below example
[16325935.447081] sd 0:0:45:0: Power-on or device reset occurred
[16325935.447962] sd 0:0:45:0: Attached scsi generic sg20 type 0
[16325935.451271] end_device-0:0:28: add: handle(0x001c), sas_addr(0x500304801810f321)
[16325935.454768] sd 0:0:45:0: [sdal] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[16325935.477576] sd 0:0:45:0: [sdal] Write Protect is off
[16325935.479913] sd 0:0:45:0: [sdal] Mode Sense: 9b 00 10 08
[16325935.482100] sd 0:0:45:0: [sdal] Write cache: enabled, read cache: enabled, supports DPO and FUA
[16325935.664995] sd 0:0:45:0: [sdal] Attached SCSI disk
Turn off the slot light
sas3ircu 0 locate 2:17 OFF
Use the GUI to replace the disk. (Use the GUI over the cmd lie to ensure it’s setup consistently with the other disks)
Storage --> Pool Gear Icon (at right) --> Status
(The removed disk should be listed bu it's UUID)
Disk Menu (three dots) --> Replace --> (disk from dmesg above) --> Force --> Replace Disk
After resilvering has finished, check the spare’s ID at the bottom and then detach it so it goes back to spare
zpool detach pool01 9d794dfd-2ef6-432d-8252-0c93e79509dc
Notes:
Note: The GUI takes several steps to prepare the disk and adds a partition to the pool, not the whole disk. It’s ‘strongly advised against’ using the CLI to replace the disk. Though if you must, you can recreate that process at the command line. as adapted from https://www.truenas.com/community/resources/creating-a-degraded-pool.100/
gpart and glable are not present on TrueNAS Scale, so you would have to adapt this to another tool
gpart create -s gpt /dev/da18
gpart add -i 1 -b 128 -t freebsd-swap -s 2g /dev/da18
gpart add -i 2 -t freebsd-zfs /dev/da18
zpool replace pool01 65f61699-e2fc-4a36-86dd-b0fa6a77479
To turn off all slot lights
for X in {0..23};do echo sas3ircu 0 locate 2:$X OFF;done
for X in {0..11};do sas3ircu 0 locate 3:$X OFF;done