
CNC (Command and Control)
Tools
What worked
v1.0 shipped in 8 phases with all infrastructure git-managed — Grafana dashboards, alert rules, Ansible playbooks, and docker-compose all versioned from day one. Claude Code handled the fire-and-forget HTTP client pattern cleanly (monitored apps never block on the hub) and produced the stack-normalization dedup logic without hand-holding. 17 test files with per-test DB migration/cleanup emerged naturally from the prompts.
What broke
BullMQ and ioredis had a version mismatch that required an `as any` type cast — Claude surfaced the conflict but the library ecosystem wasn't ready. The admin auth coupling to Grafana is elegant when Grafana is up and broken when it isn't; I accepted it as a v1 tradeoff. Phase 10 (Job Gateway) still has GATE-02/03/04 loose ends because I moved to Roughneck before finishing the integration.
Roles
I defined the taxonomy — what counts as an error pattern, what the webhook event types should be, the three-app v1 scope (Etyde, GoVejle, plus the self-referential hub). Claude Code built the Fastify routes, the stack normalizer, and the Grafana provisioning. I drove the decision to use Ollama + Qwen3 32B locally instead of a SaaS classifier because zero-cloud-cost was a hard constraint.
CNC (Command and Control)
Overview
CNC is a centralized monitoring, error aggregation, and improvement-capture system for the portfolio of AI-generated applications. It acts as an "attention multiplier" for a solo developer — automatically watching production systems, detecting outages, classifying errors with local LLM inference, and capturing improvement ideas without requiring active surveillance.
Core value: Automatically know when things break across all apps without watching dashboards all day.
Target users: Solo/small developer teams running multiple production applications.
Currently monitoring: Etyde and GoVejle.
Key Features
- Health monitoring — Real-time per-app status (up/down/unknown) with heartbeat staleness detection (180s threshold)
- Error aggregation — Deduplication by stack signature (normalized SHA-256), occurrence tracking
- Cross-app pattern detection — Same error category in 2+ apps signals structural problems
- LLM-powered taxonomy — Ollama (Qwen3 32B) classifies errors into categories
- Log aggregation — Loki-backed centralized logs queryable by app
- Grafana dashboards — Git-managed (zero manual UI), health overview, error timelines, uptime trends
- CLI tool — Status, errors, logs; exports error context as Claude Code prompts
- Improvement notes queue — Capture and track improvement ideas as Grafana annotations
- Job gateway API — Submission endpoint for Roughneck; enqueues to BullMQ
- Webhook system — HMAC-signed event delivery for
app.down,app.up,error.pattern,note.created,job.failed
Architecture
Tech Stack
| Layer | Technology |
|---|---|
| Monorepo | pnpm 9.15.0 workspaces |
| API | Fastify 5 + Node.js 22 |
| Database | PostgreSQL 17 + Drizzle ORM |
| Queue | BullMQ 5.71 + Redis 7 (v2; v1 used pg-boss) |
| Proxy | Caddy 2.11 |
| Observability | Grafana 12.4 + Loki 3.6 + Prometheus |
| LLM | Ollama + Qwen3 32B (on M4 Mac Mini) |
| Provisioning | Ansible + Docker Compose |
| CI/CD | GitHub Actions |
Package Structure
packages/
hub/ # Fastify API (10 DB tables, 9 route modules, BullMQ workers)
client/ # npm package @lovettbarron/cnc (heartbeat loop, error reporting)
cli/ # CLI tool (status, errors, logs, init)
infra/
ansible/ # VPS provisioning playbook
docker/ # Dockerfiles, init SQL
grafana/ # Provisioned datasources, dashboards, alert rules
Key Patterns
- Fire-and-forget HTTP — Client never blocks on monitoring; hub being down doesn't affect monitored apps
- Dual auth model — x-api-key for monitored apps, Bearer + hashed lookup for Roughneck jobs, admin Bearer validated against Grafana
- Error dedup by stack signature — Normalize stack frames (remove :line:col), SHA-256 hash, unique index on (app_id, stack_signature)
- HMAC-SHA256 callback verification — Per-job secrets stored at enqueue time
- Grafana provisioning via file — Dashboards, datasources, alert rules all YAML/JSON in git; zero manual UI steps
- Heartbeat metadata snapshots — Optional JSON metadata with heartbeats for historical trending
Development History
v1.0 shipped March 16, 2026 (8 phases):
- Hub infrastructure, Fastify API, Grafana dashboards, client library, error aggregation, improvement notes, LLM worker, CLI tool
v2.0 (~95% complete, Phases 9-13):
- Redis + BullMQ migration (from pg-boss), job gateway API, worker cutover to Roughneck, historical trend dashboards, webhook event system
Strengths
- Zero cloud cost — Local LLM, Hetzner VPS ~EUR4/month, no SaaS monitoring fees
- Fire-and-forget client — Apps never block on monitoring; hub outage is invisible
- Fully git-manageable — Grafana dashboards, alert rules, Ansible playbooks, docker-compose all versioned
- Comprehensive test coverage — 17 test files with DB migration/cleanup per test
- Intelligent dedup — Stack normalization handles code changes and source maps
- Webhook extensibility — HMAC-signed, exponential backoff, dead-letter table
Weaknesses & Risks
- Phase 10 (Job Gateway) incomplete — GATE-02/03/04 not fully integrated
- BullMQ ioredis version mismatch — Worked around with
as anytype cast - Admin auth couples to Grafana — If Grafana down, admin endpoints unreachable
- Staleness check uses JS filter — Fine for 3-5 apps, inefficient at 100+
- Secret rotation not yet implemented — Callback secret lifecycle management is on the operational backlog
- Prometheus included but unused — Adds 200-500MB RAM overhead for zero value
Connection to Other Projects
- Etyde — Monitored app; sends heartbeats, errors, logs
- GoVejle — Monitored app; sends heartbeats, errors, logs
- Roughneck — CNC acts as job gateway; Roughneck workers consume jobs and callback with results