Homelab Kubernetes Platform (RKE2 + Rancher)
A Proxmox-hosted RKE2 cluster operated via Rancher, exposing internal services through ingress-nginx + Pi-hole DNS and increasingly managed via GitOps (Kustomize + Argo CD). Built to practice real-world platform ops: networking, ingress, storage topology, rollouts, backups, and runbook documentation.
Overview
I run a production-style homelab Kubernetes platform on Proxmox using RKE2, with Rancher as the primary operator interface. Internal services are exposed via ingress-nginx and resolved through Pi-hole (*.homelab) so apps have stable hostnames instead of NodePorts.
Workloads and infrastructure are managed as code with Kustomize, and I’ve introduced Argo CD to move toward a true GitOps workflow: desired state lives in Git, changes are reviewed and reproducible, and cluster drift becomes visible and correctable.
Stateful services use PVC-backed persistence, with storage constraints (node-local affinity under local-path) explicitly documented. I validate platform patterns with small, real deployments (including a Train → Store → Serve “MLOps lab” workload) and capture runbooks and troubleshooting notes in an internal wiki.

What this demonstrates
- Practical Kubernetes operations: deployments, services, ingress, PVCs, probes, rollouts
- “Platform plumbing”: DNS + ingress as the stable interface for internal services
- Storage topology awareness: designing around node-local persistence (
local-path,WaitForFirstConsumer) - Rancher-first visibility while keeping infra as code
- GitOps foundations: Kustomize structure + Argo CD reconciliation, diffs, and controlled sync
- Operability: backups, verification steps, rollback thinking, documentation discipline
- Isolated testing and sandboxing (namespaces and scoped projects)
(I keep workloads small and reproducible—this isn't a "datacenter at home," it's a controlled learning environment.)
Operational Practices
Reliability & recovery
- Proxmox snapshots before risky changes
- Clear "rollback" mindset when upgrading
- Backups for configuration and critical data (plus restore testing)
Observability
- Track host + VM resource usage (CPU/RAM/disk)
- Use dashboards as "truth surfaces" for debugging instead of guessing