There is a gap between setting up a homelab and running one. The setup is documented everywhere. The running — the day-to-day operational decisions, the failures, the fixes, the things you wish someone had told you — less so. This is some of that.

The Storage Migration

The Proxmox host started with the default local-lvm setup. It works fine when you are running a handful of containers, but local-lvm does not support snapshots for LXC containers, which matters when you want an AI agent taking safety snapshots before executing remediation actions.

The migration to ZFS on tank-storage was the right call, but it was not trivial. Moving 20+ LXC containers and VMs from local-lvm to a ZFS pool while keeping everything running required careful sequencing — shut down, migrate disk, update configuration, verify boot, repeat. The local-lvm data pool is now empty. Everything lives on ZFS, which gives proper snapshot support, compression, and a more coherent storage story.

The one catch with ZFS on this setup: containers with NAS bind mounts (mp0-mp3) cannot be snapshotted due to a Proxmox limitation. That is a known constraint managed by simply not relying on snapshots for those specific containers.

Kernel and Root Disk Management

The Proxmox root disk runs at around 86% utilisation — tighter than ideal but stable after a cleanup run in March that cleared ISOs, journal logs, and old templates and recovered 16GB of space. The disk is monitored and flagged if it approaches 90%.

The kernel was updated to 6.8.12-20-pve in late March 2026. Kernel updates on a production homelab require a judgment call — stability versus security and feature improvements. The pve-manager 8.3.5 update went cleanly with no container or VM disruptions.

There is an outstanding GRUB bootloader warning that needs investigating in person. Not urgent, not ignored — noted, tracked, and scheduled for the next physical access session. This is because my contract is based in Leicester, and I am only ‘home’ a few days per month.

Memory Management

RAM is the perpetual constraint. The host runs 31GB total with approximately 21GB in use under normal conditions. Frigate, the NVR service, was consuming 1.7GB and causing swap pressure — it has been permanently stopped pending a RAM upgrade to 64GB. (Sometime in the distant future…) The current upgrade is priced at around £600 for DDR5 there is a roadmap, but then I consider that the AMD AI Max systems are £1300 with 96+ GBs of ram, soldered or not!

SABnzbd was experiencing OOM kills that took a while to diagnose. The fix was straightforward once identified — increasing CT110’s memory allocation to 2GB resolved the issue completely.

Operational Discipline

What makes a homelab useful as a portfolio piece rather than just a hobby is the operational discipline around it. Every significant change is documented. Alert thresholds are tuned per-container to reduce noise — CT900 spikes during collection bursts, so its CPU warning threshold is set to 85% rather than the global 70%. Containers that are intentionally stopped are baselined as stopped rather than generating spurious alerts.

The monitoring agent checks 25+ containers every 15 minutes. In the time it has been running it has caught real issues: disk pressure building on containers I was not actively watching, memory trends indicating impending OOM conditions, and a VPS health anomaly that turned out to be a legitimate network issue rather than a false alarm.

That is what infrastructure monitoring is supposed to do. Not just alert when things break, but give enough visibility to see what is going to break before it does.