Roughneck

March 23, 2026

Tools

Claude CodeTypeScriptNode.jsBullMQioredisFastifypinoOllamaTurborepoVitestAnsibleWireGuard

What worked

29/29 plans across 6 phases shipped in ~1-2 days. Claude Code built the plugin-based architecture cleanly — manifest-driven dependency injection with auto-discovery at boot. The three resource-class queue split (ollama=1, io=10, cpu=4) gave cross-job priority without per-job-type topology. BullMQ Flows composed GoVejle's translate → enrich → newsletter as parent-child pipelines. HMAC-signed webhook callbacks with exponential backoff retry handled delivery failures. The Ansible playbooks produced both Mac Mini launchd service and VPS Docker deployment from one role set.

What broke

The VPN link between VPS and Mac Mini is a single point of failure — one tunnel failure blocks all three apps' async processing. BullMQ jobs can get stuck 'active' after reconnection (mitigated with stalledInterval but still a sharp edge). Redis eviction policy is pinned to noeviction as a deliberate operational choice. Extended outages are bounded by available queue buffer, which is one of the lessons for any hybrid local/remote setup. Callback delivery failures are handled via the DLQ.

Roles

I set the consolidation bet — one job platform is better than three because model loading dominates cost when Ollama runs locally, and a unified queue gives cross-app priority. I defined the plugin contract (manifest, resourceClass, dependencies, retry config). Claude Code wrote the core worker engine, all 9 plugins (echo, etyde-session, etyde-set, govejle-* x4, cnc-* x2), the Ansible roles, and the CLI. The global Ollama concurrency=1 constraint was my hard physical constraint (M4 GPU can't run two 32B models without thrashing) that shaped the entire queue topology.

Roughneck (Unified Job Execution Platform)

Overview

Roughneck is a unified, plugin-based job execution platform that consolidates async background processing across three applications (Etyde, GoVejle, CNC) into a single deployable system running on an M4 Mac Mini.

**Core purpose:** Single API and queue for all async work (AI inference, d ata enrichment, scheduled tasks), replacing three independent worker implementations with one extensible architecture.

Key Features

Plugin-based architecture with manifest-driven dependency injection
Three resource-class queues: ollama (concurrency=1), io (concurrency=10), cpu (concurrency=4)
BullMQ-based queueing with priority levels, retries, dead-letter queue
HMAC-signed webhook callbacks with exponential backoff retry
Health/metrics endpoints with Prometheus export for Grafana
Scheduled job support (cron) via BullMQ repeatables
Job flow composition (parent-child pipelines) via FlowProducer
9 plugins: echo, etyde-session, etyde-set, govejle-translation, govejle-enrichment, govejle-newsletter, govejle-scheduler, cnc-error-classify, cnc-pattern-detection

Architecture

Tech Stack

Layer	Technology
Runtime	Node.js 22+ + TypeScript 5.7+
Queue	BullMQ v5.71 + ioredis v5.10
Server	Fastify v5.8 + pino v10.3
LLM	Ollama (Qwen3 32B)
Monorepo	Turborepo v2.8
Testing	Vitest v4.1
CLI	Commander v13
Deployment	Ansible + launchd (Mac), Docker (VPS)
Network	WireGuard VPN (10.0.0.0/24)

Structure

packages/
  shared/     # Types, Redis client, Ollama client, logger, constants
  core/       # Worker engine, plugin registry, callbacks, health server
  cli/        # ask, status, jobs, health commands
  plugins/    # 9 plugins (echo, etyde-*, govejle-*, cnc-*)
deploy/
  ansible/    # Playbooks, roles (wireguard, ollama, roughneck, vps)
docs/         # Architecture, plugin guide, ops runbook, cutover plans

Deployment Topology

VPS (Hetzner): Docker Compose with Redis, CNC Hub, Roughneck container, Grafana, Prometheus, Loki
Mac Mini: Ollama, Roughneck worker (launchd service), Grafana Alloy for log shipping
Network: WireGuard VPN connecting VPS to Mac Mini; Redis bound to WireGuard interface only

Plugin Architecture

Plugins declare a manifest (name, version, resourceClass, dependencies, retry config). Core uses this to:

Route jobs to correct queue (ollama/io/cpu)
Inject dependencies via PluginContext (Ollama, logger, HTTP client)
Apply retry and stall detection settings
Auto-discover at boot (no core changes needed for new plugins)

Development History

100% complete — 29/29 plans across 6 phases, built March 20-21, 2026:

Phase	Plans	Focus
1	6	Core platform (monorepo, worker engine, callback delivery, echo plugin)
2	3	CNC job gateway (Redis service, enqueue API, client library)
3	4	Etyde migration (session + set generation plugins, shadow testing)
4	8	GoVejle migration (translation, enrichment, newsletter, scheduler)
5	4	CNC migration (error classification, pattern detection, dashboards)
6	4	Operations (Ansible, CLI, model monitoring, docs)

Architectural Decisions

Decision	Rationale
Three resource-class queues (not per-job-type)	Cross-job priority, simpler topology
CNC Hub as job gateway	Single auth point, no Redis exposure to apps
Manifest-driven dependency injection	Explicit resource declarations, fail-fast at startup
BullMQ Flows for pipelines	GoVejle's translate->enrich->newsletter as composable steps
Global Ollama concurrency=1	M4 GPU handles one 32B model; two simultaneous = thrashing
Hard cutover per app (not parallel)	Simpler rollback, 1-week soak per app

Strengths

Clean separation — Core is pure infrastructure; plugins are pure domain logic
Scalable plugin system — New job type = create package + implement interface + auto-discovered
Comprehensive failure handling — Retry with exponential backoff, stalled detection, DLQ for both jobs and callbacks
Production-ready observability — Health endpoint, Prometheus metrics, structured logging, heartbeat
Infrastructure as code — Ansible playbooks for Mac Mini and VPS, auto-deployment

Weaknesses & Risks

VPN link is a single point of failure — One tunnel failure blocks all three apps' async processing
BullMQ jobs stuck after reconnection — Stale jobs can sit in "active"; mitigated with stalledInterval
Redis eviction policy deliberately pinned to noeviction — Trade-off: queue durability over memory elasticity
Callback delivery failures — Results could be lost if callback fails; mitigated with DLQ
Queue buffer is finite during extended outages — Jobs accumulate if the upstream link is down for long periods

Connection to Other Projects

Etyde — etyde-session and etyde-set plugins generate AI practice sessions
GoVejle — govejle-translation, enrichment, newsletter, scheduler plugins handle event pipeline
CNC — cnc-error-classify and cnc-pattern-detection plugins; CNC acts as job gateway

andrewlb notes