CNC (Command and Control)

March 15, 2026

Tools

Claude CodeTypeScriptFastifyPostgreSQLDrizzle ORMBullMQRedisOllamaQwen3 32BGrafanaLokiAnsible

What worked

v1.0 shipped in 8 phases with all infrastructure git-managed — Grafana dashboards, alert rules, Ansible playbooks, and docker-compose all versioned from day one. Claude Code handled the fire-and-forget HTTP client pattern cleanly (monitored apps never block on the hub) and produced the stack-normalization dedup logic without hand-holding. 17 test files with per-test DB migration/cleanup emerged naturally from the prompts.

What broke

BullMQ and ioredis had a version mismatch that required an `as any` type cast — Claude surfaced the conflict but the library ecosystem wasn't ready. The admin auth coupling to Grafana is elegant when Grafana is up and broken when it isn't; I accepted it as a v1 tradeoff. Phase 10 (Job Gateway) still has GATE-02/03/04 loose ends because I moved to Roughneck before finishing the integration.

Roles

I defined the taxonomy — what counts as an error pattern, what the webhook event types should be, the three-app v1 scope (Etyde, GoVejle, plus the self-referential hub). Claude Code built the Fastify routes, the stack normalizer, and the Grafana provisioning. I drove the decision to use Ollama + Qwen3 32B locally instead of a SaaS classifier because zero-cloud-cost was a hard constraint.

CNC (Command and Control)

Overview

CNC is a centralized monitoring, error aggregation, and improvement-capture system for the portfolio of AI-generated applications. It acts as an "attention multiplier" for a solo developer — automatically watching production systems, detecting outages, classifying errors with local LLM inference, and capturing improvement ideas without requiring active surveillance.

Core value: Automatically know when things break across all apps without watching dashboards all day.

Target users: Solo/small developer teams running multiple production applications.

Currently monitoring: Etyde and GoVejle.

Key Features

Health monitoring — Real-time per-app status (up/down/unknown) with heartbeat staleness detection (180s threshold)
Error aggregation — Deduplication by stack signature (normalized SHA-256), occurrence tracking
Cross-app pattern detection — Same error category in 2+ apps signals structural problems
LLM-powered taxonomy — Ollama (Qwen3 32B) classifies errors into categories
Log aggregation — Loki-backed centralized logs queryable by app
Grafana dashboards — Git-managed (zero manual UI), health overview, error timelines, uptime trends
CLI tool — Status, errors, logs; exports error context as Claude Code prompts
Improvement notes queue — Capture and track improvement ideas as Grafana annotations
Job gateway API — Submission endpoint for Roughneck; enqueues to BullMQ
Webhook system — HMAC-signed event delivery for app.down, app.up, error.pattern, note.created, job.failed

Architecture

Tech Stack

Layer	Technology
Monorepo	pnpm 9.15.0 workspaces
API	Fastify 5 + Node.js 22
Database	PostgreSQL 17 + Drizzle ORM
Queue	BullMQ 5.71 + Redis 7 (v2; v1 used pg-boss)
Proxy	Caddy 2.11
Observability	Grafana 12.4 + Loki 3.6 + Prometheus
LLM	Ollama + Qwen3 32B (on M4 Mac Mini)
Provisioning	Ansible + Docker Compose
CI/CD	GitHub Actions

Package Structure

packages/
  hub/       # Fastify API (10 DB tables, 9 route modules, BullMQ workers)
  client/    # npm package @lovettbarron/cnc (heartbeat loop, error reporting)
  cli/       # CLI tool (status, errors, logs, init)

infra/
  ansible/   # VPS provisioning playbook
  docker/    # Dockerfiles, init SQL
  grafana/   # Provisioned datasources, dashboards, alert rules

Key Patterns

Fire-and-forget HTTP — Client never blocks on monitoring; hub being down doesn't affect monitored apps
Dual auth model — x-api-key for monitored apps, Bearer + hashed lookup for Roughneck jobs, admin Bearer validated against Grafana
Error dedup by stack signature — Normalize stack frames (remove :line:col), SHA-256 hash, unique index on (app_id, stack_signature)
HMAC-SHA256 callback verification — Per-job secrets stored at enqueue time
Grafana provisioning via file — Dashboards, datasources, alert rules all YAML/JSON in git; zero manual UI steps
Heartbeat metadata snapshots — Optional JSON metadata with heartbeats for historical trending

Development History

v1.0 shipped March 16, 2026 (8 phases):

Hub infrastructure, Fastify API, Grafana dashboards, client library, error aggregation, improvement notes, LLM worker, CLI tool

v2.0 (~95% complete, Phases 9-13):

Redis + BullMQ migration (from pg-boss), job gateway API, worker cutover to Roughneck, historical trend dashboards, webhook event system

Strengths

Zero cloud cost — Local LLM, Hetzner VPS ~EUR4/month, no SaaS monitoring fees
Fire-and-forget client — Apps never block on monitoring; hub outage is invisible
Fully git-manageable — Grafana dashboards, alert rules, Ansible playbooks, docker-compose all versioned
Comprehensive test coverage — 17 test files with DB migration/cleanup per test
Intelligent dedup — Stack normalization handles code changes and source maps
Webhook extensibility — HMAC-signed, exponential backoff, dead-letter table

Weaknesses & Risks

Phase 10 (Job Gateway) incomplete — GATE-02/03/04 not fully integrated
BullMQ ioredis version mismatch — Worked around with as any type cast
Admin auth couples to Grafana — If Grafana down, admin endpoints unreachable
Staleness check uses JS filter — Fine for 3-5 apps, inefficient at 100+
Secret rotation not yet implemented — Callback secret lifecycle management is on the operational backlog
Prometheus included but unused — Adds 200-500MB RAM overhead for zero value

Connection to Other Projects

Etyde — Monitored app; sends heartbeats, errors, logs
GoVejle — Monitored app; sends heartbeats, errors, logs
Roughneck — CNC acts as job gateway; Roughneck workers consume jobs and callback with results

andrewlb notes

CNC (Command and Control)

Tools

What worked

What broke

Roles

CNC (Command and Control)

Overview

Key Features

Architecture

Tech Stack

Package Structure

Key Patterns

Development History

Strengths

Weaknesses & Risks

Connection to Other Projects