TuringPi

February 11, 2026

Tools

Claude CodeAnsibleK3sFluxCDTraefikMetalLBcert-managerSOPSAgeLonghornPrometheusGrafanaLokiVeleroTailscale

What worked

91% complete (phases 1-7 done, phase 8 pending), 23 plans in ~0.92 hours of total execution time. Claude Code produced 40 idempotent Ansible roles with FQCN and serial execution for etcd safety, the PITFALLS.md up front identifying 7 critical risks with prevention strategies before a single role was written. K3s HA with 3 servers + 1 agent, embedded etcd, and true survival-of-1-node-failure worked on the first full deploy. The 3-tier storage split (local-path fast, Longhorn distributed 2-replica, NFS to Synology) handled 7 production apps (Home Assistant, AdGuard, Immich, Paperless, Penpot, Calibre, Sonarr with VPN routing). FluxCD introduced gradually with prune: false first.

What broke

32GB total RAM is a hard ceiling — memory-heavy apps not viable, every deploy requires resource tuning. Node 3 SATA is PCIe Gen 2 x1 (500MB/s) — single point for write-intensive workloads. HAOS migration complexity: addons don't exist as standalone containers so running Home Assistant in K3s instead of HAOS meant rebuilding the integration story. ARM64 image compatibility remains a constant friction — many images lack ARM64 variants. eMMC/SD write wear is mitigated with tmpfs + log rotation but still a risk.

Roles

I set the 'never SSH to production' constraint — everything flows through Ansible or FluxCD, period. Hardware choices (4x CM4 8GB, Longhorn 2-replica not 3 to save storage, MetalLB Layer 2 because no BGP router at home) were mine. Claude Code wrote all 40 Ansible roles, the FluxCD kustomizations, the Kyverno policies, and the backup infrastructure (Velero + Backblaze B2 + etcd snapshots + K3s token backup). The decision to use SOPS + Age over Sealed Secrets was Claude's proposal that I accepted for simpler key management.

TuringPi (Homelab Kubernetes Cluster)

Overview

TuringPi is a production-grade, self-hosted Kubernetes homelab cluster on Turing Pi 2 hardware (4x Raspberry Pi CM4 modules, 8GB each). It runs containerized applications with full automation, high availability, and disaster recovery — entirely managed through configuration, no manual SSH operations.

Target users: Homelab enthusiast (sole operator) seeking self-hosted alternatives to cloud services.

Key Features

K3s Kubernetes Cluster — 3 server nodes (HA with embedded etcd) + 1 agent node
7 Applications: Home Assistant Core, AdGuard Home, Immich, Paperless-ngx, Penpot, Calibre-web, Sonarr with VPN routing
GitOps Pipeline — FluxCD watches GitHub repo, auto-reconciles cluster state
3-Tier Storage: Local-path (fast), Longhorn (distributed, 2-replica), NFS to Synology NAS (shared)
Networking: MetalLB load balancer, Traefik ingress, cert-manager TLS, Tailscale remote access
Monitoring: kube-prometheus-stack (Prometheus/Grafana), Loki log aggregation, Alertmanager
Backup: Velero with Backblaze B2 cloud backend
Security: SOPS/Age encryption, NetworkPolicies, Kyverno policy enforcement

Architecture

Tech Stack

Layer	Technology
Kubernetes	K3s v1.34.3+k3s3 (ARM64, embedded etcd)
OS	Ubuntu Server 24.04 LTS (arm64)
GitOps	FluxCD v2.5+
Ingress	Traefik v3.6+
Load Balancer	MetalLB v0.14+ (Layer 2)
TLS	cert-manager v1.16+ (Let's Encrypt)
Secrets	SOPS v3.9+ with Age v1.2+
VPN	Tailscale + WireGuard
Storage	Longhorn v1.7+, NFS CSI, local-path
Monitoring	kube-prometheus-stack, Loki, Grafana Alloy
Provisioning	Ansible 2.15+ (40 roles)

Hardware

Node	Role	Peripherals
Node 1	K3s init server	GPIO, HDMI, mini PCIe, DSI
Node 2	K3s server (HA)	mini PCIe, M.2
Node 3	K3s server + SATA	2x SATA III, M.2 (only node with SATA)
Node 4	K3s agent (worker)	4x USB 3.0, M.2

Ansible Automation

40 roles organized into 7 phases:

ansible/
├── site.yml              # Master playbook
├── roles/
│   ├── common/           # Hostname, timezone, packages, NTP, swap
│   ├── networking/       # Static IPs, /etc/hosts
│   ├── security/         # Users, SSH hardening (port 2222), firewall
│   ├── tailscale/        # VPN mesh network
│   ├── k3s-server/       # HA cluster with embedded etcd
│   ├── k3s-agent/        # Worker node join
│   ├── longhorn-install/ # Distributed storage
│   ├── nfs-csi-install/  # Synology NAS integration
│   ├── velero-install/   # Backup controller
│   ├── metallb-install/  # LoadBalancer
│   ├── traefik-install/  # Ingress
│   ├── cert-manager/     # Auto HTTPS
│   ├── fluxcd-bootstrap/ # GitOps + SOPS/Age
│   ├── fluxcd-kyverno/   # Policy enforcement
│   ├── core-services-*/  # Monitoring, DNS, dashboards
│   └── apps-*/           # Application deployments

Development History

91% complete (Phases 1-7 done, Phase 8 pending), 23 plans in ~0.92 hours:

Phase	Focus
1	Foundation: Ansible scaffold, common/networking/security/tailscale
2	K3s cluster: HA with 3 servers + 1 agent, embedded etcd
3	Storage: Longhorn, NFS CSI, Velero backups
4	Networking: MetalLB, Traefik, cert-manager, NetworkPolicies
5	GitOps: FluxCD, SOPS/Age, infrastructure HelmReleases, Kyverno
6	Core services: Prometheus/Grafana, AdGuard DNS, Portainer, Homepage
7	Applications: CloudNativePG, HA Core, Immich, Paperless, Penpot, Calibre, Sonarr
8 (Pending)	MCP integration, disaster recovery docs, hardware swap guides

Architectural Decisions

Decision	Rationale
K3s over full K8s	Lightweight (100MB), ARM64, embedded etcd HA
FluxCD over ArgoCD	Lower resource footprint (critical for 8GB nodes)
Container HA over HAOS	HAOS takes entire machine; containers enable cluster integration
3 servers + 1 agent	Odd number for etcd quorum, survives 1 failure
Longhorn 2 replicas (not 3)	Saves storage for 4-node cluster
SOPS + Age over Sealed Secrets	Simpler key management, no Kubernetes controller dependency
MetalLB Layer 2	Home network lacks BGP router

Strengths

Exceptional planning — PITFALLS.md identifies 7 critical risks with prevention strategies before building
Hardware-agnostic — Variables can swap CM4 -> CM5 without code changes
True HA — Survives 1 node failure with automatic failover
Production-grade security — SSH hardening, NetworkPolicies, Tailscale, SOPS
Ansible excellence — 40 idempotent roles, FQCN, serial execution for etcd safety
Pragmatic GitOps — FluxCD introduced gradually, prune: false initially
Comprehensive backup — Multiple layers including etcd snapshots, Velero, K3s token backup

Weaknesses & Risks

32GB total RAM — Memory-heavy apps not viable; requires careful resource tuning
Node 3 SATA bottleneck — PCIe Gen 2 x1 (500MB/s); single point for write-intensive workloads
HAOS migration complexity — Addons don't exist as standalone containers
ARM64 image compatibility — Many images lack ARM64 variants
eMMC/SD write wear — Mitigated with tmpfs and log rotation but still a risk
Resource exhaustion — Single app without limits can cascade-fail the cluster

Connection to Other Projects

GoVejle — Could host GoVejle infrastructure components
CNC — Could monitor the cluster's applications
Roughneck — Potential deployment target for worker infrastructure

andrewlb notes