
TuringPi
Tools
What worked
91% complete (phases 1-7 done, phase 8 pending), 23 plans in ~0.92 hours of total execution time. Claude Code produced 40 idempotent Ansible roles with FQCN and serial execution for etcd safety, the PITFALLS.md up front identifying 7 critical risks with prevention strategies before a single role was written. K3s HA with 3 servers + 1 agent, embedded etcd, and true survival-of-1-node-failure worked on the first full deploy. The 3-tier storage split (local-path fast, Longhorn distributed 2-replica, NFS to Synology) handled 7 production apps (Home Assistant, AdGuard, Immich, Paperless, Penpot, Calibre, Sonarr with VPN routing). FluxCD introduced gradually with prune: false first.
What broke
32GB total RAM is a hard ceiling — memory-heavy apps not viable, every deploy requires resource tuning. Node 3 SATA is PCIe Gen 2 x1 (500MB/s) — single point for write-intensive workloads. HAOS migration complexity: addons don't exist as standalone containers so running Home Assistant in K3s instead of HAOS meant rebuilding the integration story. ARM64 image compatibility remains a constant friction — many images lack ARM64 variants. eMMC/SD write wear is mitigated with tmpfs + log rotation but still a risk.
Roles
I set the 'never SSH to production' constraint — everything flows through Ansible or FluxCD, period. Hardware choices (4x CM4 8GB, Longhorn 2-replica not 3 to save storage, MetalLB Layer 2 because no BGP router at home) were mine. Claude Code wrote all 40 Ansible roles, the FluxCD kustomizations, the Kyverno policies, and the backup infrastructure (Velero + Backblaze B2 + etcd snapshots + K3s token backup). The decision to use SOPS + Age over Sealed Secrets was Claude's proposal that I accepted for simpler key management.
TuringPi (Homelab Kubernetes Cluster)
Overview
TuringPi is a production-grade, self-hosted Kubernetes homelab cluster on Turing Pi 2 hardware (4x Raspberry Pi CM4 modules, 8GB each). It runs containerized applications with full automation, high availability, and disaster recovery — entirely managed through configuration, no manual SSH operations.
Target users: Homelab enthusiast (sole operator) seeking self-hosted alternatives to cloud services.
Key Features
- K3s Kubernetes Cluster — 3 server nodes (HA with embedded etcd) + 1 agent node
- 7 Applications: Home Assistant Core, AdGuard Home, Immich, Paperless-ngx, Penpot, Calibre-web, Sonarr with VPN routing
- GitOps Pipeline — FluxCD watches GitHub repo, auto-reconciles cluster state
- 3-Tier Storage: Local-path (fast), Longhorn (distributed, 2-replica), NFS to Synology NAS (shared)
- Networking: MetalLB load balancer, Traefik ingress, cert-manager TLS, Tailscale remote access
- Monitoring: kube-prometheus-stack (Prometheus/Grafana), Loki log aggregation, Alertmanager
- Backup: Velero with Backblaze B2 cloud backend
- Security: SOPS/Age encryption, NetworkPolicies, Kyverno policy enforcement
Architecture
Tech Stack
| Layer | Technology |
|---|---|
| Kubernetes | K3s v1.34.3+k3s3 (ARM64, embedded etcd) |
| OS | Ubuntu Server 24.04 LTS (arm64) |
| GitOps | FluxCD v2.5+ |
| Ingress | Traefik v3.6+ |
| Load Balancer | MetalLB v0.14+ (Layer 2) |
| TLS | cert-manager v1.16+ (Let's Encrypt) |
| Secrets | SOPS v3.9+ with Age v1.2+ |
| VPN | Tailscale + WireGuard |
| Storage | Longhorn v1.7+, NFS CSI, local-path |
| Monitoring | kube-prometheus-stack, Loki, Grafana Alloy |
| Provisioning | Ansible 2.15+ (40 roles) |
Hardware
| Node | Role | Peripherals |
|---|---|---|
| Node 1 | K3s init server | GPIO, HDMI, mini PCIe, DSI |
| Node 2 | K3s server (HA) | mini PCIe, M.2 |
| Node 3 | K3s server + SATA | 2x SATA III, M.2 (only node with SATA) |
| Node 4 | K3s agent (worker) | 4x USB 3.0, M.2 |
Ansible Automation
40 roles organized into 7 phases:
ansible/
├── site.yml # Master playbook
├── roles/
│ ├── common/ # Hostname, timezone, packages, NTP, swap
│ ├── networking/ # Static IPs, /etc/hosts
│ ├── security/ # Users, SSH hardening (port 2222), firewall
│ ├── tailscale/ # VPN mesh network
│ ├── k3s-server/ # HA cluster with embedded etcd
│ ├── k3s-agent/ # Worker node join
│ ├── longhorn-install/ # Distributed storage
│ ├── nfs-csi-install/ # Synology NAS integration
│ ├── velero-install/ # Backup controller
│ ├── metallb-install/ # LoadBalancer
│ ├── traefik-install/ # Ingress
│ ├── cert-manager/ # Auto HTTPS
│ ├── fluxcd-bootstrap/ # GitOps + SOPS/Age
│ ├── fluxcd-kyverno/ # Policy enforcement
│ ├── core-services-*/ # Monitoring, DNS, dashboards
│ └── apps-*/ # Application deployments
Development History
91% complete (Phases 1-7 done, Phase 8 pending), 23 plans in ~0.92 hours:
| Phase | Focus |
|---|---|
| 1 | Foundation: Ansible scaffold, common/networking/security/tailscale |
| 2 | K3s cluster: HA with 3 servers + 1 agent, embedded etcd |
| 3 | Storage: Longhorn, NFS CSI, Velero backups |
| 4 | Networking: MetalLB, Traefik, cert-manager, NetworkPolicies |
| 5 | GitOps: FluxCD, SOPS/Age, infrastructure HelmReleases, Kyverno |
| 6 | Core services: Prometheus/Grafana, AdGuard DNS, Portainer, Homepage |
| 7 | Applications: CloudNativePG, HA Core, Immich, Paperless, Penpot, Calibre, Sonarr |
| 8 (Pending) | MCP integration, disaster recovery docs, hardware swap guides |
Architectural Decisions
| Decision | Rationale |
|---|---|
| K3s over full K8s | Lightweight (100MB), ARM64, embedded etcd HA |
| FluxCD over ArgoCD | Lower resource footprint (critical for 8GB nodes) |
| Container HA over HAOS | HAOS takes entire machine; containers enable cluster integration |
| 3 servers + 1 agent | Odd number for etcd quorum, survives 1 failure |
| Longhorn 2 replicas (not 3) | Saves storage for 4-node cluster |
| SOPS + Age over Sealed Secrets | Simpler key management, no Kubernetes controller dependency |
| MetalLB Layer 2 | Home network lacks BGP router |
Strengths
- Exceptional planning — PITFALLS.md identifies 7 critical risks with prevention strategies before building
- Hardware-agnostic — Variables can swap CM4 -> CM5 without code changes
- True HA — Survives 1 node failure with automatic failover
- Production-grade security — SSH hardening, NetworkPolicies, Tailscale, SOPS
- Ansible excellence — 40 idempotent roles, FQCN, serial execution for etcd safety
- Pragmatic GitOps — FluxCD introduced gradually, prune: false initially
- Comprehensive backup — Multiple layers including etcd snapshots, Velero, K3s token backup
Weaknesses & Risks
- 32GB total RAM — Memory-heavy apps not viable; requires careful resource tuning
- Node 3 SATA bottleneck — PCIe Gen 2 x1 (500MB/s); single point for write-intensive workloads
- HAOS migration complexity — Addons don't exist as standalone containers
- ARM64 image compatibility — Many images lack ARM64 variants
- eMMC/SD write wear — Mitigated with tmpfs and log rotation but still a risk
- Resource exhaustion — Single app without limits can cascade-fail the cluster
Connection to Other Projects
- GoVejle — Could host GoVejle infrastructure components
- CNC — Could monitor the cluster's applications
- Roughneck — Potential deployment target for worker infrastructure