Homelab Kubernetes Platform (RKE2 + Rancher)
A 4-node Proxmox cluster running RKE2 Kubernetes with full GitOps (Argo CD), three-layer backup system (Proxmox + Velero + etcd), Grafana 12 observability, encrypted secrets (SOPS/age), MCP server integration for AI-assisted ops, and SSH-hardened infrastructure — operated as a production-grade learning environment.
Overview
I run a production-style homelab Kubernetes platform on Proxmox using RKE2, with Rancher as the primary operator interface. Internal services are exposed via ingress-nginx and resolved through Pi-hole (*.homelab) so apps have stable hostnames instead of NodePorts.
Workloads and infrastructure are managed as code with Kustomize, and I’ve introduced Argo CD to move toward a true GitOps workflow: desired state lives in Git, changes are reviewed and reproducible, and cluster drift becomes visible and correctable.
Stateful services use PVC-backed persistence, with storage constraints (node-local affinity under local-path) explicitly documented. I validate platform patterns with small, real deployments (including a Train → Store → Serve “MLOps lab” workload) and capture runbooks and troubleshooting notes in an internal wiki.

What this demonstrates
- Practical Kubernetes operations: deployments, services, ingress, PVCs, probes, rollouts
- DNS + ingress as the stable interface for internal services
- Storage topology awareness: designing around node-local persistence (
local-path,WaitForFirstConsumer) - GitOps foundations: Kustomize structure + Argo CD reconciliation, diffs, and controlled sync
- Three-layer backup strategy (Proxmox + Velero + etcd)
- Encrypted secrets management with SOPS/age
- SSH-hardened infrastructure with scoped access controls
- Grafana 12 observability with Prometheus metrics
- MCP server integration for AI-assisted cluster operations
Operational Practices
Reliability & recovery
- Proxmox snapshots before risky changes
- Velero + Kopia for Kubernetes workload backup and restore
- etcd snapshots synced to NAS on schedule
- Documented rollback procedures
Observability
- Grafana 12 dashboards for cluster health and resource usage
- Prometheus metrics across all workloads
- Custom cost attribution dashboards
Security
- Key-only SSH authentication across all nodes
- Scoped kubeconfigs for different access levels
- Secrets encrypted at rest
- MCP servers scoped to read-only at every layer