ZFS import fix

This is a follow up on my previous post ZFS pool not importing upon reboot. In there, I documented the issue that my storage pool would not automatically import upon reboot. The take away was that it seemed to be an occasional blip, and that without digging further, I would wait and see if it ever happens again.

Well, during the very next patch and reboot cycle, I had the same issue. After researching some more (1 and 2), the fix was relatively easy:

  • systemctl status zfs-import-cache.service make sure it's enabled and running
  • systemctl disable zfs-import@mypool.service do this for every pool; previously, it was enabled in Proxmox
  • zpool set cachefile=/etc/zfs/zpool.cache mypool do this for every pool; previously, this value was unset in Proxmox
  • update-initramfs -k all -u
  • reboot now

Here is the brief explanation:

  • ZFS should use zfs-import-cache.service to automatically load and import pools; as the name suggests, it uses cache file instead of the actual hard drives, which may or may not be available during boot.
  • zfs-import@mypool.service is the service that does loading by looking for the hard drives. Disable it for each pool. (why was it enabled in Proxmox?)
  • manually set the "cachefile" value for each pool. (again, why was it empty?)

Since I installed Proxmox 7 on this machine three years ago and only started to experience ZFS import issue in recent months, I suspect that a recent update brought changes to the order of services being loaded during boot. As a result, the bad default exposed itself.

I know that Proxmos has its quarks when it comes to HA stuff, but this ZFS implementation is another knock on its reputation (for me). If I ever have to do it from scratch, I'll probably go with stock Ubuntu with KVM/Qemu and ZFS.

ZFS pool not importing upon reboot

Background

My main hypervisor is a Dell R720 server running Proxmox. It has 8 spinning hard drives making up a ZFS pool called r720_storge_pool. There is also a high performance VM pool that runs on NVMe SSDs, and a boot pool created by Proxmox. Every month, I upgrade Proxmox and reboot to apply new kernel.It has been running mostly maintenance-free for a few years until yesterday, after I routinely rebooted it.

Before I jump to the actual issue, it would be helpful to lay some details on how the current stack works:

  • the ZFS pool r720_storge_pool has some encrypted datasets, whose key is stored in the boot pool and loaded upon reboot. The process does not require user invervention and the pool is automatically imported upon reboot, mounted under /r720_storage_pool
  • based on ArchWiki SFTP chroot, I set up bind mount in /etc/fstab so that OpenSSH server can serve the pool via SFTP:
    /r720_storage_pool/encrypted/media /srv/ssh/media none bind,defaults,nofail,x-systemd.requires=zfs-mount.service 0 0
  • I also created dedicated users for SFTP/SSHFS purpose only. Their entry in /etc/passwd is as follows:
    media:x:1001:1000::/srv/ssh/media:/usr/sbin/nologin
  • the VMs (in this case, a Docker host) access the SFTP chroot jail upon boot, conveniently defined in /etc/fstab:
    media@proxmox.local.lan:/ /home/ewon/media fuse.sshfs defaults,delay_connect,_netdev,allow_other,default_permissions,uid=1000,gid=1000,IdentityFile=/home/ewon/.ssh/id_ed25519 0 0
  • Docker containers consume ZFS storage backend through bind mounted Docker volumes, defined in docker-compose file, for example:
    ...
    volumes:
      - /home/ewon/media:/data/photo
      - /home/ewon/media:/data/video
    ...

Incident

After rebooting Proxmox, I noticed some services are not available. I went to the Docker VM (runs on Proxmox) and found out that all the containers that uses r720_storge_pool failed to start.

I've had some trouble in the past when I reboot the hypervisor, due to a race condition between Docker VM and SFTP server on Proxmox. Since then, I added start delay on the Docker host and the issue never happened again. However, this time it's different.

Investigation

I ssh'ed into docker.local.lan and noticed that SSHFS are mounted correctly, but there were no content in the directory. "Oh no!" This can't be good.

Following up the chain, I ssh'ed into proxmox.local.lan and checked ZFS pools. zfs status would not show r720_storage_pool. I started sweating.

Manually zfs import -a would not import the pool, either. I rushed down to the server rack, all 8 drives are still blinking and humming. "Ok", my drive are still there, nobody stole them, cats didn't piss on them (story for another day). Did the disk controller give up? On the terminal, I checked /dev/disk/by-id, thank goodness all of my sdX devices still show up.

Next, I need to manually import all the disks and make the pool available again:

  1. zpool import -d /dev/disk/by-id it took a good few seconds to run, and my pool showed up again!
  2. zpool status -v shows pool with 0 error, very healthy.
  3. zfs load-keys -r r720_storage_pool/media/encrypted load encryption key file
  4. zfs get key-status r720_storage_pool/media/encrypted (optionally) check key status
  5. zfs mount -a mount all the pools again, just in case
  6. mount -a to bind mount zpool to /srv/ssh directory again, so it won't show as empty

Lastly, on Docker VM, re-mount all the SSHFS resources and start the docker containers. Crisis mode over, for now. Wipe forehead.

Root cause

As soon as I put my beans together, I started to wonder why it happened in the first place and how to avoid it in the future.

I usually always start by checking failed systemd units:

root@r720:~# systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.

Nothing to see here, move along.

After some online search, folks have been talking about corrupted zfs-import-cache, like here and here. The problem for me is not a failed zfs-import-cache.service, in fact, it seemed the service runs just fine:

● zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: active (exited) since Sat 2024-05-04 20:37:25 EDT; 17h ago
       Docs: man:zpool(8)
    Process: 1935 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN $ZPOOL_IMPORT_OPTS (code=exited, status=0/SUCCESS)
   Main PID: 1935 (code=exited, status=0/SUCCESS)
        CPU: 15ms

May 04 20:37:25 r720 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
May 04 20:37:25 r720 zpool[1935]: no pools available to import
May 04 20:37:25 r720 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.

However, this line no pools available to import caught my attention. If we dig a little deeper, this line is rather an exception than the norm, comparing to recent server reboots:

root@r720:~# journalctl -u zfs-import-cache.service

-- Boot 11f348a818b2439598566752a8d2cdbc --
Feb 04 11:05:24 r720 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
Feb 04 11:05:31 r720 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
-- Boot 1ca32d8475854b98b65153f2a801dd15 --
Mar 03 21:45:27 r720 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
Mar 03 21:45:34 r720 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
-- Boot b71ade2b29ce4b999c337c65636238c7 --
Apr 07 22:13:45 r720 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
Apr 07 22:13:51 r720 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
-- Boot e921bbd16046429d8fdb03e6f35a0d88 --
May 04 20:37:25 r720 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
May 04 20:37:25 r720 zpool[1935]: no pools available to import
May 04 20:37:25 r720 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
lines 30-84/84 (END)

I started to check and compare raw journalctl messages, between current boot and previous boot. with a focus on keywords zfs and r720_storage_pool.

In previous boot processes, zfs-zed.seivice was able to import the pool, as can be seen in the following three lines. The same cannot be said for the current boot, at least not prior to me manually importing the pool.

Apr 07 22:13:54 r720 zed[2783]: eid=11 class=pool_import pool='r720_storage_pool'
Apr 07 22:13:54 r720 zed[2786]: eid=10 class=config_sync pool='r720_storage_pool'
Apr 07 22:13:54 r720 zed[2794]: eid=15 class=config_sync pool='r720_storage_pool'

At this point, without getting too deep into the rabbit hole, it is reasonaby safe to conclude that the disks are not ready when Proxmox's ZFS daemon tries to import the pool. As to why that is the case, I can't tell. I do, however, believe the following factors can contribute and may even have contributed to the issue:

  • aging hard drives are slow to for the operating system to initialize
  • Proxmox 8.2 brings some changes (most likely heavier workload for the system) that delays disk initialization
  • ZFS got upgraded, and runs a bit faster, hence is started to import pools before the devices are ready
  • It's Saturday night and nobody wants to work?!

Final words

For now, there is no clear answer as to what action to be taken to prevent this from hapenning again. Maybe the issue is just an one-off, or maybe it will keep happening. I decided to do nothing for now and keep an eye out in the future.

Perhaps I can edit zfs-import-cache.service to add some delay, like this post suggests, but I don't like the idea of adding ad-hoc fixes for an unconfirmed issue. I've seen far too many overly reacting sysadmins (or their managers) in my previous jobs. The bandage ended up in production for so long, nobody even knows why it was there in the first place, and fear to tear them down. Fortunately, this is my homelab and I have 100% say in this.

The ultimate "right" solution is to replace my aging hard drives with something newer and faster, even enterprise SSDs. But I'm satisfied with its current speed and capacity, why bother? I plan to run the current batch of hard drives to the ground, despite grey beard all suggest otherwise.

Nextcloud upgrade woes

I have been self-hosting a Nextcloud instance for almost two years. It is a LAMP stack in a Proxmox LXC container. The container's operating system is Debian 11, with PHP 7.2.

Up until Nextcloud 25, everything is good. I always use the web updater for minor and major Nextcloud upgrades. It wasn't always smooth sails (sometimes I need to drop into the command line to do some post upgrade stuff), but generally speaking things work as intended.

A few months ago, I heard Nextcloud 26 would deprecated PHP 7.2, which means Debian 11 would not be able to upgrade to Nextcloud 26. That's fine, because Debian 12 was just around the horizon. I can rock 25 until Debian 12 comes out in the summer.

Fast forward to yesterday, I decided to upgrade my LXC container to Debian 12 and Nextcloud from 25 to 27, since both projects just released major upgrade within the last week. How exciting! Strangely enough, in the Nextcloud web interface, under "Administration settings", it doesn't even report new version 26 or 27.

I thought "Fine, I will upgrade Debian first and then use Nextcloud web updater". Turns out, Debian upgrade went very smoothly; all php packages were bumped from 7.2 to 8.2; reboot, done. However, Nextcloud cannot be opened, the web interface says something like "This version of Nextcloud is not compatible with PHP>=8.2. You are currently running 8.2.7". I start to grind my teeth as Nextcloud throws me into this hoop. "Fine, I will manually upgrade".

Following the How to upgrade guide, I downloaded latest.zip from Nextcloud website, and start the (painful) process:

  • turn maintenance mode on
  • unzip the file
  • copy everything except config and data into the document root located at var/www/nextcloud
  • make sure user, group and permissions are correct
  • added “apc.enable_cli = 1” to php cli config because of this bug
  • sudo -u www-data php occ upgrade

Of course it didn't work. I went to the web interface to see why, it says "Updates between multiple major versions are unsupported". You can hear me grinding my teeth from across the street.

Finally, after a lot of faffing, I downloaded Nextcloud 26.0.2 and successfully upgraded. However, that's not the end of misery. As per usually, major upgrade always needs some cleaning up. I got half a dozen warnings under "Administration settings", like php memory_limit, file hash mismatch, cron job failed, etc. They are not difficult to fix, just hella annoying.

Just thinking about 26-27 upgrade will put me through (some of) the rigmarole again, I'm already tired. This process is stressful and tedious, especially for something you only need to do every half a year. It periodically reminds me of the bad old days of system administration. Maybe I should've opted in the docker container deployment, I don't know.

On the flip side, thank goodness I have ZFS snapshots for the container and data directory. Should something goes wrong I can always roll back.

Setting up ZoneMinder NVR

For the past year or so, I use BlueIris as my network video recorder (NVR) solution on a dedicated HP ProDesk computer, running Windows 10. BlueIris has proven to be a solid piece of software. Although I only use a fraction of its offerings, I still think it's worth the cost (yes, it's paid, proprietary and Windows-only). I had no plan to replace it any time soon.

That is, at least until yesterday, when the HP computer just died. Turns out, my cat sneaked into the basement a few weeks ago and urinated on my server rack, and the liquid got into the case and slowly eroded the motherboard. The desktop was lying horizontally on the top shelf of my rack and caught most of the liquid, saving all the equipment beneath it. It's pretty heroic, in a sense.

I took a picture of the the dead motherboard.

Well, it's time to pick a new NVR solution (preferably open source) and chug along with life. I had previously tried Shinobi and concluded that the web interface was not to my liking. I tried ZoneMinder as well on an i5 computer back then, but it was simply not powerful enough to drive motion detection on 3+ cameras.

Understanding the pain points

After running security camera systems for more than a year, I have figured out my use case and exactly what I need from an NVR system. I simply need it to:

  • Record 24/7 low-res footage on all cameras
  • Low maintenance effort
  • Low CPU usage and disk I/O (ideally)

What I thought I needed but actually don't:

  • GPU encoding/decoding (pass-through is good for me)
  • Motion detection (cats running around all the time, tons of false trigger)
  • High-res footage (nice to have, but low-res is good enough)
  • Multiple copies of backup (footage from weeks ago are not critically important)

With that in mind, I decide to give ZoneMinder another go. This time, with a minimum viable configuration.

Deciding the hosting infrastructure

With the HP desktop down, I only have three servers left: Dell PowerEdge R720 as main hypervisor with 8 hard drives in ZFS mirror vdev configuration; HP ProLiant DL360p Gen8 as a testing server; and a dedicated backup server running TrueNAS. I better not mess around with the backup server, and I don't want to occupy the testing server with mission critical tasks. The only logical choice is Dell R720.

ZoneMinder is a PHP web application running on top of LAMP stack, so it should have no problem running in a LXC container. In fact, this YouTube video shows exactly the same setup. External storage is something we have to configure, though, as discussed later in this post.

Installation

Follow the most recent guide on ZoneMinder wiki for either Ubuntu or Debian systems: add apt source list, install dependencies and ZoneMinder itself. Fire up a web browser and go to x.x.x.x/zm, or have a proxy in front of it and go to my-zonminder-url.tld/zm and you are greeted with a setup wizard. At the time of writing, the most recent stable version is v1.36.12 on Debian 11 Bullseye.

Before running the wizard, I would add external storage to the LXC container. By default, ZoneMinder stores "events" under /var/cache/zoneminder/events. This directory is owned by www-data user and group. In Proxmox, there is a neat trick to bind mount a path on the host system to the guest container. In this case, I created a ZFS dataset /tank/encrypted/zoneminder and bind mounted it to /var/cache/zoneminder/events in the container. I had to add the following line in the container config file (/etc/pve/lxc/Container_ID.conf):

mp0: /tank/encrypted/zoneminder,mp=/var/cache/zoneminder/events

Reboot the container; the external storage will show up.

We also need to fix the permissions of the storage directory. By default, the www-data user and group have UID and GID numbered at 33, which maps to 100033 on the host system. We could simply chown -R 100033:100033 /tank/encrypted/zoneminder on the host system and call it a day. Since I already have a ZFS dataset with 100033 UID/GID bind mounted to the Nextcloud container, I would go the extra mile by changing the UID/GID number of www-data user/group on ZoneMinder container to some number other than 33. The logic is that, if either container is compromised, their www-data user/group can not touch the other's bind mount storage path on the host system. It may sound overly cautions, but reducing attack vector is always a good practice.

Adding Reolink Cameras

I have a total of five Reolink 5MP cameras. This wiki page and this Reddit post show how to add Reolink cameras to ZoneMinder. If in doubt, use ONVIF auto detection. I find it much easier probing than messing around with RTSP/RTMP feed URL.

Each camera has a sub stream and a main stream. I record sub streams 24/7 and use main streams purely as monitors.

For me, the sub feed URL is:

rtsp://admin:password@IP:554/h264Preview_01_sub

The main feed URL is:

rtsp://admin:password@IP:554/h264Preview_01_main

Of course, substitute username and password based on your setup. The default username for Reolink cameras is "admin"; the default password is empty.

Wrap-up

I won't go into details on how to configure ZoneMinder because I know everyone's need is different. However, I suggest at least add a filter rule to purge events when disk gets full.

Keep an eye on CPU and RAM usage, as ZoneMinder relies on CPU to do processing and it can get hungry in terms of resource usage. Also pay attention to the usage of tmpfs /dev/shm if you have motion detection configured, as it can get filled up quickly. The size of tmpfs can be increased in VMs but not in LXC containers.

That's all I can think of right now. My configuration is very minimum. It works for me but may not suits everyone. I am still exploring ZoneMinder myself and may revisit this topic later on.