TuringPi
Tools
What worked
The v1.2 architecture simplification was the pivotal decision: switching from 3-server etcd HA to 1-server + 3-agent and replacing Longhorn with local-path-provisioner freed ~1GB RAM and eliminated distributed storage complexity that was overkill for a homelab. The 'never SSH to production' constraint forced everything through Ansible or FluxCD, which paid off when recovery bootstrap scripts could rebuild the cluster from scratch. The Wyoming voice pipeline (Whisper STT -> OpenClaw -> Piper TTS) working end-to-end through Home Assistant validated voice as a viable interface for household AI.
What broke
32GB total RAM is a hard ceiling — the simplification helped but every new service still requires careful resource tuning. ARM64 image compatibility remains persistent friction. The voice pipeline depends on the Mac Mini node being available with no HA fallback, which is exactly the kind of single point of failure I'd criticize in someone else's architecture. The Twilio bridge for Android Auto phone calls is still blocked on Docker build and E2E verification.
Roles
I set the 'never SSH to production' constraint and made the v1.2 architecture simplification call based on real-world experience — etcd quorum was solving a problem I didn't have. Claude Code wrote the recovery bootstrap scripts, kube-router CNI migration, per-service deployment playbooks, Twilio bridge server, Wyoming container configs, and the custom HA conversation agent.
TuringPi (Homelab Kubernetes Cluster)
Overview
TuringPi is a self-hosted Kubernetes homelab cluster on Turing Pi 2 hardware (4x Raspberry Pi CM4 modules, 8GB each). It runs containerized applications with full automation and disaster recovery — entirely managed through GitOps and Ansible with zero manual SSH operations.
Target users: Homelab enthusiast (sole operator) seeking self-hosted alternatives to cloud services.
What It Does
- K3s cluster (1 server + 3 agents) with FluxCD GitOps reconciliation from a GitHub repo
- 10+ applications: Home Assistant Core, AdGuard Home, Immich, Paperless-ngx, Penpot, Calibre-web, Sonarr with VPN routing, OpenClaw AI agent, Vikunja, Mosquitto MQTT
- Wyoming voice pipeline: Whisper STT, Piper TTS, openWakeWord on Mac Mini node, proxied through a custom HA conversation agent to OpenClaw
- Twilio bridge for hands-free phone call interaction via Android Auto (in progress)
- Full networking stack: MetalLB load balancer, Traefik ingress, cert-manager TLS, Tailscale VPN
- Monitoring: kube-prometheus-stack, Loki log aggregation, Alertmanager
- Backup: Velero with Backblaze B2 cloud backend
- Security: SOPS/Age encryption, NetworkPolicies, Kyverno policy enforcement
- 40 Ansible roles for idempotent provisioning with pre-flight validation
How It Fits Together
K3s runs on Ubuntu Server 24.04 (ARM64) across 4 CM4 nodes. FluxCD watches a GitHub repo and auto-reconciles cluster state. Ansible handles initial provisioning and recovery (40 roles across 7 phases). Secrets are encrypted with SOPS/Age, never stored in plaintext. Storage uses local-path-provisioner with a SATA drive on Node 3 (replaced Longhorn in v1.2 for lower memory footprint). The Mac Mini joins as an external node for compute-heavy workloads like the Wyoming voice containers.
Architecture Decisions
- 1 server + 3 agents over 3-server etcd HA — Quorum was overkill for a homelab; simplification freed ~1GB RAM across the cluster
- local-path + SATA over Longhorn — Distributed storage's memory footprint was unjustifiable on 8GB ARM nodes
- K3s over full K8s — 100MB footprint, native ARM64 support
- FluxCD over ArgoCD — Lower resource footprint, critical for memory-constrained nodes
- Container HA over HAOS — HAOS takes an entire machine; containers enable proper cluster integration
- SOPS + Age over Sealed Secrets — Simpler key management, no Kubernetes controller dependency
- MetalLB Layer 2 — Home network lacks a BGP router
What Changed After Dogfooding
The biggest lesson was admitting that etcd HA was premature engineering. I was building for a failure mode (server quorum loss) that simply doesn't matter in a homelab where the whole board shares a single power supply. Dropping it freed real resources and reduced operational complexity. Similarly, Longhorn's distributed storage was solving for multi-node persistence I didn't need — a single SATA drive on one node works fine when Velero handles disaster recovery to B2 cloud.
The voice pipeline (v1.3) was an exercise in accepting dependency chains. Wyoming -> OpenClaw -> Piper all running on the Mac Mini means a single node failure kills voice. I chose to ship it anyway because the alternative was not shipping it at all, but it's a conscious debt.
Weaknesses & Open Questions
- 32GB total RAM — Memory-heavy apps not viable; every new service requires resource tuning
- Node 3 SATA bottleneck — PCIe Gen 2 x1 (500MB/s); single point for write-intensive workloads
- ARM64 image compatibility — Many container images lack ARM64 variants
- eMMC/SD write wear — Mitigated with tmpfs and log rotation but still a long-term risk
- Voice pipeline has no HA — Mac Mini going offline kills Wyoming containers with no fallback
- Resource exhaustion — A single app without limits can cascade-fail the cluster
Ecosystem Role
TuringPi is the deployment target for OpenClaw (household AI agent running as a K3s pod with Prometheus metrics) and provides the infrastructure backbone for Home Assistant integration. CNC monitors OpenClaw via Prometheus scrape and a dedicated Grafana dashboard. The cluster could eventually host other portfolio services (GoVejle, Roughneck workloads) if RAM constraints ease with a hardware upgrade.