Nebula mesh VPN still disappointing after 4 years

Background

My homelab network designs follow a simple hub-and-spoke model: mobile clients route all traffic back to home, using a Wireguard tunnel. In Wireguard client, I specify OPNsense as the DNS server (dnsmasq as recursive resolver). Every device, no matter where they are, has ad blocking and custom domain name resolution.

This setup has served me well for years. With power and internet disruptions at home becoming more frequent—and the possibility of relocating in the future—I’ve begun considering a more robust solution. A solution that does not pose my home router as a single point of failure.

For the next iteration of my homelab network, I decide to go with overlay mesh VPN solution. I know, I know, everyone is using Tailscale these days, but I'd like to use an open source solution with self-hosted control plane (or discovery node, or whatever the name is for a particular project).

I decide to revisit a cool project called nebula. In 2021, before I first deployed VPN solution for my homelab, I briefly tried it but didn't meet my basic requirements. It has the following limitations back then:

  • didn't exist in the repositories of Linux distributions; had to be manually installed (not a big deal)
  • didn't have built-in DNS support (nice to have but meh)
  • didn't have relay support (can be problematic for devices behind CGNAT)
  • didn't have iOS/Android mobile client (a huge show-stopper)

Now in 2025, with v1.9.6, all of the above pain points have seemingly been addressed. I was excited to give it another GO (pun intended) 😂

Implementation

Following nebula documentation to set it up is pretty straight forward. The package now exists in Debian, Ubuntu and pretty much all Linux/BSD repositories.

All I need to do it is to install the package on all nodes, set up CA, generate and distribute certificates, and put config.yml to all nodes (except for mobile clients, which still do not support full config)

Differences between nebula and Tailscale

One major difference between nebula and Tailscale, is that almost all configuration (including firewall settings) is applied on each node, not on the lighthouse (control plane). It also leaves certificates management to the administrator.

For a nebula site with a handful of servers/clients, this is still manageable by hand. But once the number grows, or you have complex firewall requirements, or just want to follow the best practice by rotating certificates more frequently, it becomes a nightmare trying to keep each node up-to-date.

From what I can tell, nebula was and still is catered to large-scale enterprise use cases. It scales great with automation, but for a homelab environment, it is positioned quite awkwardly. I don't want to and simply don't have time to run a private CA just for it. The best I can do is probably write an Ansible playbook to use jinga2 template to automate config.yml generation for each host. Then again, since each node is slightly different, Ansible won't really save time, as it's merely a way to centrally manage configuration.

It takes a relatively large amount of work to maintain a relatively small setup.

Limitations as of v1.9.6

Now, will it truly live up to its promised features? Well, let’s find out.

Packages

As I have mention before, this is not an issue anymore. Although, the package supplied systemd unit "nebula@.service" still lacks documentation.

DNS

The DNS feature lighthouse provides is very primitive, as it only dynamically resolve nodes' names (not even lighthouses themselves, see limitations).

Also, config.yml doesn't allow overriding operating system's DNS resolver. This is less of an issue on Linux/BSD or even Windows/Mac, but can absolutely be a show-stopper on mobile clients.

On iOS, since I cannot customize DNS server (well, I can do it for every Wi-Fi connection, but not for cellular data connection), I must rely on the VPN client to override DNS. Wireguard iOS client does this, Tailscale too, so does every major commercial VPN client. BUT THIS IS NOT POSSIBLE ON NEBULA!!! (issue#9)

iOS client

On my servers and Linux desktop, things are pretty smooth once set up. Not so much on iOS. First off, you cannot import configuration, period. Configuration has to be entered manually. Configuration options are very limited, comparing to desktop versions. The following configurations (that are important to me) are not supported on iOS:

You cannot upload an already signed device certificate (issue#20). You have to export the device-generated public key (not CSR), sign it with CA, then upload the device cert. Why does it have to be so quirky?

Oh, last but not least, you cannot make the VPN connection stay always-on (issue#49), ugh!

Conclusion

After spending two nights testing nebula v1.9.6 and its corresponding iOS client, I have to give it a hard pass, again, after 4 years.

It seems like this project is not aimed at small organisation and hobbyist. Maybe performance at very large scale is its selling point? I don't know, but it doesn't capture me as a potential user.

With the concept of overlay mesh VPN become so ubiquitous in 2025, I find it baffling that nebula still lacks many basic features, comparing to Tailsclae/Headscale, ZeroTier, Netbird, you name it.

Anyway, moving forward, I will test Headscale and report back after some time.

Strange issue on OPNsense with Unbound DNS

I'm documenting an (quite possibly the strangest) issue I have ever experienced in IT. At the moment of writing, I managed to fix it but still don't know why it happened in the first place.

On a peaceful Saturday afternoon, out of nowhere, internet stopped working. Under panic mode (I wish I was exaggerating), I rushed down to my computer and started troubleshooting. "Thankfully", it's only an issue with DNS, as ping 1.1.1.1 still works.

A quick background on my router setup, I use OPNsense with Unbound DNS as recursive DNS server for my entire LAN. Pretty normal setup, in fact, it's the default.

Restarting Unbound DNS service did not help; rebooting OPNsense did not help, power cycle the ISP ONT box didn't help, either.

As the whole family needs internet access ASAP, I did a quick fix by turning off Unbound DNS service entirely, and let OPNsense use upstream public DNS service instead.

(A quick note: only Proxmox did not receive the newly advertised public DNS server, as the network settings are "static"; other client devices received the change shortly)

In the evening, I had some time to troubleshoot further. I disabled DNSBL, domain overrides and some other fancy features I enabled in Unbound, making it almost a vanilla install. Still no luck. I noticed the clients are not able to reach DNS server (gateway/VLAN interface/network address, it's all the same thing) over port 53; however, logs still don't show anything useful.

In a final attempt, I pulled OPNsense documentation and started to examine my config line by line. I noticed the Access Lists (ACL) default action is "refuse". After I change it back to the default "accept", DNS service immediately started working.

Ok, problem solved, but questions started popping up. Why was ACL changed? Nobody touched the box in months; automated upgrades happened weeks ago; what gives?

Since I rebooted OPNsense shortly after the incident, I lost logs; and since I only configured the system to keep 5 backups, I consumed this number quickly when I was troubleshooting and overwrote the config prior to reboot.

Searching on the internet didn't give me much answer, either. Nobody seems to have Unbound ACL suddenly flipping from accept to refuse. The real cause of this issue may never be known.

After this, I will consider keeping logs for longer and maybe even consider remote logging options. Enable more automatic backups (from 5 to say, 20) is also a good idea.