Mast Insomnia

I run Omarchy on a Framework 13 laptop – a beautiful Arch Linux distribution built on Hyprland. Over the past few months I’ve heavily customized it: a dynamic menu system, dual VPN modules in Waybar (PIA + PiVPN), NAS auto-mounting scripts, a keybinding cheatsheet generator, and more.

Most of these customizations worked great. But my PiVPN connection had a persistent problem: it would die after 5-15 minutes. My “fix” was a cron-based watchdog that restarted the OpenVPN service whenever it detected trouble. Except the watchdog had its own problems. It was a band-aid on a band-aid.

I finally sat down to properly audit every script and config I’d written. What I found was humbling.

The Audit

I went through about 20 scripts across Hyprland configs, Waybar modules, VPN management, NAS mounting, and more. Most of it was solid – the NAS smart-mount script uses /proc/self/mountinfo instead of mountpoint to avoid hanging on dead NFS, the terminal MOTD adapts based on window count and terminal width, the dynamic menu patcher injects custom items into Omarchy’s stock menu at runtime.

But then I found the bugs.

Bug #1: The Watchdog Was the Problem

My PiVPN watchdog ran every minute via root’s crontab. Its job: if the VPN service is running but the tunnel is down, restart it. Simple enough. Except the code looked like this:

if systemctl is-active --quiet openvpn-client@pivpn.service; then
    logger "PiVPN watchdog: tun0 missing, restarting"
    systemctl restart openvpn-client@pivpn.service
fi

See the bug? It checks if the service is active, then unconditionally restarts it. It never checks if tun0 is actually missing. The watchdog was killing a perfectly healthy VPN connection every 60 seconds.

The journal logs confirmed it – 30+ minutes of entries at perfect one-minute intervals:

15:10:00 - watchdog: "tun0 missing, restarting"
15:11:00 - watchdog: "tun0 missing, restarting"
15:12:00 - watchdog: "tun0 missing, restarting"
...

Every single “disconnect” in my logs was SIGTERM[hard,] – an external process killing the VPN. The connection was never dying on its own. The watchdog was the disease pretending to be the cure.

Bug #2: Dead Code Inside a Certificate Block

This one was wild. My OpenVPN client config had options like persist-key, persist-tun, custom keepalive timers, and explicit-exit-notify – all the right things for a stable connection. But I’d placed them inside the <ca>...</ca> inline certificate block:

<ca>

# Survive network hiccups better
persist-key
persist-tun
keepalive 10 30
ping 10
ping-restart 60
explicit-exit-notify 2

-----BEGIN CERTIFICATE-----
XXXxxx...

OpenVPN’s parser stops processing config directives when it hits <ca> and doesn’t resume until </ca>. Everything between those tags is treated as certificate data – not configuration. Those options were never active.

For months, I thought I had persist-tun keeping my tunnel alive across reconnects. I didn’t. The tunnel was being destroyed every time, which made the watchdog’s job even harder. The config looked right, but the parser never saw it.

Bug #3: WiFi Power Save Was Killing UDP

I dug deeper: what was causing the original disconnect problem, before the watchdog was ever added?

$ iw dev wlan0 get power_save
Power save: on

The Framework 13’s Intel WiFi was periodically putting the radio to sleep to save battery. This drops UDP packets silently. My PiVPN runs on UDP port 1194, with keepalive pings every 15 seconds. If the radio sleeps through a few ping cycles, OpenVPN declares the connection dead.

This was the root cause all along. The WiFi drops packets, OpenVPN thinks the server is gone, the tunnel dies, and then the broken watchdog makes everything worse by restarting the service in an infinite loop.

The Fixes

WiFi Power Save

Created a udev rule at /etc/udev/rules.d/81-wifi-powersave.rules that disables power save whenever a WiFi interface comes up:

ACTION=="add", SUBSYSTEM=="net", KERNEL=="wlan*", \
  RUN+="/usr/bin/iw dev %k set power_save off"

This fires automatically on boot, resume from sleep, and any time the WiFi adapter reconnects. The battery impact is negligible.

Watchdog Rewrite

The fixed watchdog does two things the original didn’t:

  1. Actually checks for tun0 before restarting
  2. Skips the check if the service started less than 30 seconds ago – after a restart, tun0 takes 1-2 seconds to come back up. Without this grace period, the next cron tick sees tun0 missing during initialization and restarts again, creating an infinite loop.
#!/bin/bash

# PiVPN watchdog - restarts openvpn if tun0 disappears.
# Runs via root crontab every 5 minutes.
# Skips check if service started < 30s ago (avoids race
# during init).

if systemctl is-active --quiet \
    openvpn-client@pivpn.service; then
    active_ts=$(systemctl show \
      openvpn-client@pivpn.service \
      --property=ActiveEnterTimestamp --value)
    if [ -n "$active_ts" ]; then
        active_epoch=$(date -d "$active_ts" +%s \
          2>/dev/null || echo 0)
        now_epoch=$(date +%s)
        age=$(( now_epoch - active_epoch ))
        if [ "$age" -lt 30 ]; then
            exit 0
        fi
    fi

    if ! ip link show tun0 &>/dev/null; then
        logger "PiVPN watchdog: tun0 missing \
          (service up ${age:-?}s), restarting"
        systemctl restart \
          openvpn-client@pivpn.service
    fi
fi

Also reduced the cron frequency from every 1 minute to every 5 minutes. With the root causes fixed, the watchdog is just a safety net now.

OpenVPN Config Cleanup

Four changes to /etc/openvpn/client/pivpn.conf:

Rescued the dead options. Moved persist-key, persist-tun, ping, ping-restart, and explicit-exit-notify from inside <ca> to before it, where the parser actually reads them.

Removed AES-256-CBC fallback. The connection always negotiates AES-256-GCM (confirmed in journal logs). Having CBC as a fallback was disabling Data Channel Offload (DCO), forcing the VPN into userspace instead of kernel mode.

Added a pull-filter. The PiVPN server pushes block-outside-dns, a Windows-only option that logs an error on every Linux connection. pull-filter ignore "block-outside-dns" suppresses it cleanly.

Cleaned up keepalive conflicts. The config had three conflicting keepalive settings (keepalive 10 30, ping 10, ping-restart 60) plus the server pushing its own values. Removed the keepalive macro and kept the explicit ping/ping-restart directives.

The final config section looks like this – clean and intentional:

data-ciphers AES-256-GCM
auth SHA256
auth-nocache
verb 3

# Suppress Windows-only option pushed by server
pull-filter ignore "block-outside-dns"

# Survive network hiccups
persist-key
persist-tun
ping 10
ping-restart 60
explicit-exit-notify 2

<ca>
-----BEGIN CERTIFICATE-----
...

Bonus: NFS Unmount Consistency

While going through my scripts, I also caught an inconsistency. My NAS smart-mount script carefully uses /proc/self/mountinfo to check mounts – because mountpoint -q talks to the filesystem and can hang indefinitely on dead NFS. But the unmount script was using mountpoint -q:

# Old way -- can hang on dead NFS
for m in "${MOUNTS[@]}"; do
  mountpoint -q "$m" && sudo umount "$m" || true
done

Fixed to use the same safe approach, plus lazy unmount so the umount itself can’t hang either:

# Safe way -- reads procfs, never hangs
is_mounted() {
  awk -v t="$1" \
    '$5==t {found=1} END{exit(found?0:1)}' \
    /proc/self/mountinfo
}

for m in "${MOUNTS[@]}"; do
  is_mounted "$m" && sudo umount -l "$m" || true
done

The Result

Before the fixes: VPN restarted every 60 seconds, indefinitely.

After the fixes: 16 hours of continuous uptime on the first night. Zero watchdog interventions. Clean reconnects when switching networks.

The explicit-exit-notify option – finally active after being rescued from the <ca> block – now tells the server when the client disconnects cleanly:

SIGTERM received, sending exit notification to peer

Instead of the old hard kill:

SIGTERM[hard,] received, process exiting

Takeaways

Read your logs. The journal told the whole story. Every disconnect was an external SIGTERM, not a timeout. I just never looked closely enough at the timestamps.

Check your config parser’s quirks. OpenVPN’s inline blocks (<ca>, <cert>, <key>, <tls-crypt>) swallow everything between the tags. If your options aren’t taking effect, make sure they’re not accidentally inside an inline block. The config will look perfectly correct and silently do nothing.

WiFi power save and UDP VPNs don’t mix. If you’re running OpenVPN or WireGuard on a laptop, disable WiFi power save. The battery savings are negligible; the connection stability improvement is dramatic.

Audit your workarounds. When you add a watchdog or a cron job to fix a problem, make sure the fix isn’t introducing new problems. My watchdog was restarting a healthy connection 1,440 times per day.

The simplest explanation is usually right. I spent months assuming PiVPN was inherently unstable over residential connections. Turns out the WiFi radio was just taking naps.

⬇️ Download the fix scripts and example configs HERE!


Running Omarchy on a Framework 13. PiVPN on a Raspberry Pi. Both highly recommended.