This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Gluster
Gluster is a distributed file system that supports both replicated and dispersed data.
Supporting dispersed data is a differentiating feature. Only a few can distribute the data in a erasure-coded or RAID-like fashion, making efficient use of space while providing redundancy. Have 5 cluster members? Just add one ‘parity bit’ for just a 20% overhead and you can loose a host. Add more parity if you like with the incremental cost. Other systems require you to duplicate your data for a 50% hit.
It’s also generally perceived as less complex than competitors like Ceph, as it has fewer moving parts and is focused on block storage. And since it uses native filesystems, you can always access your data directly. Redhat has ceased it’s corporate sponsorship, but the project is still quite active.
So you just need file storage and you have a lot of data, use gluster.
1 - Gluster on XCP-NG
Let’s set up a distributed and dispersed example cluster. We’ll XCP-NG for this. This is similar to an erasure-coded ceph pool.
Preparation
We use three hosts, each connected to a common network. With three we can disperse data enough to take one host at a time out of service. We use 4 disks on each host in this example but any number will work as long as they are all the same.
Network
Hostname Resolution
Gluster requires the hosts be resolvable by hostname. Verify all the hosts can ping each other by name. You may want to create a hosts file and copy to all three to help.
If you have free ports on each server, consider using the second interface for storage, or a mesh network for better performance.
# Normal management and or guest network
192.168.1.1 xcp-ng-01.lan
192.168.1.2 xcp-ng-02.lan
192.168.1.3 xcp-ng-03.lan
# Storage network in a different subnet (if you have a second interface)
192.168.10.1 xcp-ng-01.storage.lan
192.168.10.2 xcp-ng-02.storage.lan
192.168.10.3 xcp-ng-03.storage.lan
Firewall Rules
Gluster requires a few rules; one for the daemon itself and one per ‘brick’ (drive) on the server. You can also just allow the cluster members cart-blanc access. We’ll do both examples here. Add these to all cluster members.
vi /etc/sysconfig/iptables
# Note that the last line in the existing file is a REJECT. Make sure to insert these new rules BEFORE that line.
-A RH-Firewall-1-INPUT -p tcp -s xcp-ng-01.storage.lan -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -s xcp-ng-02.storage.lan -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -s xcp-ng-03.storage.lan -j ACCEPT
# Possibly for clients
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -s client-01.storage.lan -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -s client-01.storage.lan -j ACCEPT
OR
vi /etc/sysconfig/iptables
# The gluster daemon needs ports 24007 and 24008
# Individual bricks need ports starting at 49152. Add an additional port per brick.
# Here we have 49152-49155 open for 4 brickes.
# TODO - test this command
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-01.storage.lan --dport 24007:24008 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-01.storage.lan --dport 49152:49155 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-02.storage.lan --dport 24007:24008 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-02.storage.lan --dport 49152:49155 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-03.storage.lan --dport 24007:24008 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp -s xcp-ng-03.storage.lan --dport 49152:49155 -j ACCEPT
Disk
Gluster works with filesystems. This is convenient because if all else fails, you still have files on disks you can access. XFS is well regarded with gluster admins, so we’ll use that.
# Install the xfs programs
yum install -y xfsprogs
# Wipe the disks before using, then format the whole disk. Repeat for each disk
wipefs -a /dev/sda
mkfs.xfs /dev/sda
Let’s mount those disks. The convention is to put them in /data organized by volume. We’ll use ‘volume01’ later in the config, so lets use that here as well.
On each server
# For 4 disks - Note, gluster likes to call them 'bricks'
mkdir -p /data/glusterfs/volume01/brick0{1,2,3,4}
mount /dev/sda /data/glusterfs/volume01/brick01
mount /dev/sdb /data/glusterfs/volume01/brick02
mount /dev/sdc /data/glusterfs/volume01/brick03
mount /dev/sdd /data/glusterfs/volume01/brick04
Add the appropriate config to your /etc/fstab
so they mount at boot
Installation
A Note About Versions
XCP-NG is CentOS 7 based and provides GlusterFS v8 in their Repo. This version went EOL in 2021. You can add the CentOS Storage Special Interest group repo to get to v9, but no current version can be installed.
# Not recommended
yum install centos-release-gluster --enablerepo=epel,base,updates,extras
# On each host
yum install -y glusterfs-server
systemctl enable --now glusterd
# On the first host
gluster peer probe xcp-ng-02.storage.lan
gluster peer probe xcp-ng-03.storage.lan
gluster pool list
UUID Hostname State
a103d6a5-367b-4807-be93-497b06cf1614 xcp-ng-02.storage.lan Connected
10bc7918-364d-4e4d-aa16-85c1c879963a xcp-ng-03.storage.lan Connected
d00ea7e3-ed94-49ed-b56d-e9ca4327cb82 localhost Connected
# Note - localhost will always show up for the host you're running the command on
Configuration
Gluster talks about data as being distributed and dispersed.
Distributed
# Distribute data amongst 3 servers, each with a single brick
gluster volume create MyVolume server1:/brick1 server2:brick1 server3:brick1
Any time you have more that one drive, it’s distributed. That can be across different disks on the same host, or across different hosts. There is no redundancy, however, and any loss of disk is loss of data.
Disperse
# Disperse data amongst 3 bricks, each on a different server
gluster volume create MyVolume disperse server1:/brick1 server2:/brick1 server3:/brick1
Dispersed is how you build redundancy across servers. Any one of these servers or bricks can fail and the data is safe.
# Disperse data amongst 6 six bricks, but some on the same server. Problem!
gluster volume create MyVolume disperse \
server1:/brick1 server2:/brick1 server3:/brick1
server1:/brick2 server2:/brick2 server3:/brick2
If you try and disperse your data across multiple bricks on the same server, you’ll run into the problem of sub-optimal parity. You’ll see the error message:
Multiple bricks of a disperse volume are present on the same server. This setup is not >optimal. Bricks should be on different nodes to have best fault tolerant configuration
Distributed-Disperse
# Disperse data into 3 brick subvolumes before distributing
gluster volume create MyVolume disperse 3 \
server1:/brick1 server2:/brick1 server3:/brick1
server1:/brick2 server2:/brick2 server3:/brick2
By specifying disperse COUNT
you tell gluster that you want to create a subvolumes every COUNT bricks. In the above example, it’s every three bricks, so two subvolumes get created from the six bricks. This ensures the parity is optimally handled as it’s distributed.
You can also take advantage of bash shell expansion like below. Each subvolume is one line, repeated for each of the 4 bricks it will be distributed across.
gluster volume create volume01 disperse 3 \
xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick01/brick \
xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick02/brick \
xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick03/brick \
xcp-ng-0{1..3}.storage.lan:/data/glusterfs/volume01/brick04/brick
Operation
Mounting and Optimizing Volumes
mount -t glusterfs xcp-ng-01.storage.lan:/volume01 /mnt
gluster volume set volume01 group metadata-cache
gluster volume set volume01 performance.readdir-ahead on
gluster volume set volume01 performance.parallel-readdir on
gluster volume set volume01 group nl-cache
gluster volume set volume01 nl-cache-positive-entry on
Adding to XCP-NG
mount -t glusterfs xcp-ng-01.lan:/volume01/media.2 /root/mnt2/
mkdir mnt2/xcp-ng
xe sr-create content-type=user type=glusterfs name-label=GlusterSharedStorage shared=true \
device-config:server=xcp-ng-01.lan:/volume01/xcp-ng \
device-config:backupservers=xcp-ng-02.lan:xcp-ng-03.lan
Scrub and Bitrot
Scrub is off by default. You can enable scrub at which point the scrub daemon will begin “signing” files (by calculating checksum). The file-system parity isn’t used. So if you enable and immediately begin a scrub you will see many “Skipped files” as their checksum hasn’t been calculated yet.
Client Installation
The FUSE client is recommended. The docs cover a .deb based install, but you can also install from the repo. On Debian:
sudo apt install lsb-release gnupg
OS=$(lsb_release --codename --short)
# Assuming the current version of gluster is 11
wget -O - https://download.gluster.org/pub/gluster/glusterfs/11/rsa.pub | sudo pt-key add -
echo deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/11/LATEST/Debian/${OS}/amd64/apt ${OS} main | sudo tee /etc/apt/sources.list.d/gluster.list
sudo apt update; sudo apt install glusterfs-client
You need quite a few options to use this successfully at boot in the fstab
192.168.3.235:/volume01 /mnt glusterfs nofail,x-systemd.automount,x-systemd.requires=network-online.target,x-systemd.device-timeout=10 0 0
How to reboot a node
You may find that your filesystem has paused during a reboot. Take a look at your network timeout and see if setting it lower helps.
https://unix.stackexchange.com/questions/452939/glusterfs-graceful-reboot-of-brick
gluster volume set volume01 network.ping-timeout 5
Using notes from https://www.youtube.com/watch?v=TByeZBT4hfQ