
Roughneck
Tools
What worked
29/29 plans across 6 phases shipped in ~1-2 days. Claude Code built the plugin-based architecture cleanly — manifest-driven dependency injection with auto-discovery at boot. The three resource-class queue split (ollama=1, io=10, cpu=4) gave cross-job priority without per-job-type topology. BullMQ Flows composed GoVejle's translate → enrich → newsletter as parent-child pipelines. HMAC-signed webhook callbacks with exponential backoff retry handled delivery failures. The Ansible playbooks produced both Mac Mini launchd service and VPS Docker deployment from one role set.
What broke
The VPN link between VPS and Mac Mini is a single point of failure — one tunnel failure blocks all three apps' async processing. BullMQ jobs can get stuck 'active' after reconnection (mitigated with stalledInterval but still a sharp edge). Redis eviction policy is pinned to noeviction as a deliberate operational choice. Extended outages are bounded by available queue buffer, which is one of the lessons for any hybrid local/remote setup. Callback delivery failures are handled via the DLQ.
Roles
I set the consolidation bet — one job platform is better than three because model loading dominates cost when Ollama runs locally, and a unified queue gives cross-app priority. I defined the plugin contract (manifest, resourceClass, dependencies, retry config). Claude Code wrote the core worker engine, all 9 plugins (echo, etyde-session, etyde-set, govejle-* x4, cnc-* x2), the Ansible roles, and the CLI. The global Ollama concurrency=1 constraint was my hard physical constraint (M4 GPU can't run two 32B models without thrashing) that shaped the entire queue topology.
Roughneck (Unified Job Execution Platform)
Overview
Roughneck is a unified, plugin-based job execution platform that consolidates async background processing across three applications (Etyde, GoVejle, CNC) into a single deployable system running on an M4 Mac Mini.
**Core purpose:** Single API and queue for all async work (AI inference, d ata enrichment, scheduled tasks), replacing three independent worker implementations with one extensible architecture.
Key Features
- Plugin-based architecture with manifest-driven dependency injection
- Three resource-class queues: ollama (concurrency=1), io (concurrency=10), cpu (concurrency=4)
- BullMQ-based queueing with priority levels, retries, dead-letter queue
- HMAC-signed webhook callbacks with exponential backoff retry
- Health/metrics endpoints with Prometheus export for Grafana
- Scheduled job support (cron) via BullMQ repeatables
- Job flow composition (parent-child pipelines) via FlowProducer
- 9 plugins: echo, etyde-session, etyde-set, govejle-translation, govejle-enrichment, govejle-newsletter, govejle-scheduler, cnc-error-classify, cnc-pattern-detection
Architecture
Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Node.js 22+ + TypeScript 5.7+ |
| Queue | BullMQ v5.71 + ioredis v5.10 |
| Server | Fastify v5.8 + pino v10.3 |
| LLM | Ollama (Qwen3 32B) |
| Monorepo | Turborepo v2.8 |
| Testing | Vitest v4.1 |
| CLI | Commander v13 |
| Deployment | Ansible + launchd (Mac), Docker (VPS) |
| Network | WireGuard VPN (10.0.0.0/24) |
Structure
packages/
shared/ # Types, Redis client, Ollama client, logger, constants
core/ # Worker engine, plugin registry, callbacks, health server
cli/ # ask, status, jobs, health commands
plugins/ # 9 plugins (echo, etyde-*, govejle-*, cnc-*)
deploy/
ansible/ # Playbooks, roles (wireguard, ollama, roughneck, vps)
docs/ # Architecture, plugin guide, ops runbook, cutover plans
Deployment Topology
- VPS (Hetzner): Docker Compose with Redis, CNC Hub, Roughneck container, Grafana, Prometheus, Loki
- Mac Mini: Ollama, Roughneck worker (launchd service), Grafana Alloy for log shipping
- Network: WireGuard VPN connecting VPS to Mac Mini; Redis bound to WireGuard interface only
Plugin Architecture
Plugins declare a manifest (name, version, resourceClass, dependencies, retry config). Core uses this to:
- Route jobs to correct queue (ollama/io/cpu)
- Inject dependencies via PluginContext (Ollama, logger, HTTP client)
- Apply retry and stall detection settings
- Auto-discover at boot (no core changes needed for new plugins)
Development History
100% complete — 29/29 plans across 6 phases, built March 20-21, 2026:
| Phase | Plans | Focus |
|---|---|---|
| 1 | 6 | Core platform (monorepo, worker engine, callback delivery, echo plugin) |
| 2 | 3 | CNC job gateway (Redis service, enqueue API, client library) |
| 3 | 4 | Etyde migration (session + set generation plugins, shadow testing) |
| 4 | 8 | GoVejle migration (translation, enrichment, newsletter, scheduler) |
| 5 | 4 | CNC migration (error classification, pattern detection, dashboards) |
| 6 | 4 | Operations (Ansible, CLI, model monitoring, docs) |
Architectural Decisions
| Decision | Rationale |
|---|---|
| Three resource-class queues (not per-job-type) | Cross-job priority, simpler topology |
| CNC Hub as job gateway | Single auth point, no Redis exposure to apps |
| Manifest-driven dependency injection | Explicit resource declarations, fail-fast at startup |
| BullMQ Flows for pipelines | GoVejle's translate->enrich->newsletter as composable steps |
| Global Ollama concurrency=1 | M4 GPU handles one 32B model; two simultaneous = thrashing |
| Hard cutover per app (not parallel) | Simpler rollback, 1-week soak per app |
Strengths
- Clean separation — Core is pure infrastructure; plugins are pure domain logic
- Scalable plugin system — New job type = create package + implement interface + auto-discovered
- Comprehensive failure handling — Retry with exponential backoff, stalled detection, DLQ for both jobs and callbacks
- Production-ready observability — Health endpoint, Prometheus metrics, structured logging, heartbeat
- Infrastructure as code — Ansible playbooks for Mac Mini and VPS, auto-deployment
Weaknesses & Risks
- VPN link is a single point of failure — One tunnel failure blocks all three apps' async processing
- BullMQ jobs stuck after reconnection — Stale jobs can sit in "active"; mitigated with stalledInterval
- Redis eviction policy deliberately pinned to noeviction — Trade-off: queue durability over memory elasticity
- Callback delivery failures — Results could be lost if callback fails; mitigated with DLQ
- Queue buffer is finite during extended outages — Jobs accumulate if the upstream link is down for long periods
Connection to Other Projects
- Etyde — etyde-session and etyde-set plugins generate AI practice sessions
- GoVejle — govejle-translation, enrichment, newsletter, scheduler plugins handle event pipeline
- CNC — cnc-error-classify and cnc-pattern-detection plugins; CNC acts as job gateway