andrewlb notes

GoVejle

GoVejle

Tools

Claude CodeNext.jsTypeScriptPostgreSQLDrizzle ORMBullMQRedisOllamaQwen3 32BPlaywrightListmonkValhallaTurborepoWireGuard

What worked

425 commits, 281 test files, 17+ phases — all running at EUR7-10/month total cost because everything that can be local IS local (Ollama on Mac Mini, Valhalla routing, Listmonk self-hosted). The hybrid VPS + Mac Mini architecture via WireGuard worked as designed: nightly batch 3:00 AM scraper → 3:30 AM translation → 4:00 AM geocoding/travel times → Thursday 10 PM newsletter. Claude Code produced the browser pool pattern (single Chromium, multiple BrowserContexts) that cut memory from N*200MB to 200MB + N*20MB. The dynamic scraper registry (sources from DB) made adding sources a config change, not a code change.

What broke

8 custom Playwright scrapers are brittle to site redesigns — scraper maintenance scales superlinearly and I'm the only one maintaining them. PII stripping at ingestion is conservative but I haven't formally audited it against real-world false negatives. Scraper etiquette (User-Agent, robots.txt) beyond simple rate limiting is still on the to-do list. Public endpoint hardening — rate limiting, abuse protection — is in the operational backlog. Newsletter segment names are hardcoded — will break if Listmonk lists are renamed.

Roles

I set the legal structure (Danish frivillig forening), the target user (English speakers in Triangle Region ~1-3K people), and the editorial decisions about what counts as an event worth including. Claude Code wrote the scrapers, the Ollama translation pipeline, the BullMQ orchestration, and the Ansible-free provisioning. The decision to route everything Danish-language through Ollama locally (instead of DeepL as primary) was mine and cost-driven.

GoVejle (Event Discovery Platform)

Overview

GoVejle is an English-language event discovery platform for expats and English speakers in Denmark's Triangle Region (Vejle, Billund, Kolding, Fredericia, Give, Jelling). It scrapes 20+ Danish sources, translates and enriches events with AI, and presents them through a web interface and weekly newsletter.

Core question answered: "What can we do today?"

Target users: English-speaking residents, international families, organized expat groups (estimated 1,000-3,000 people). Registered as a Danish frivillig forening (voluntary association).

Key Features

  • Event aggregation — Scrapes 20+ sources (KultuNaut, municipality sites, cultural venues, libraries)
  • AI translation — Danish->English via local Ollama (Qwen3 32B)
  • Smart categorization — AI-driven tagging (indoor/outdoor, family-friendly, free/paid, age ranges)
  • Web discovery — Browse, filter by date/category, search, rich event details with images
  • Weekly newsletter — AI-generated summaries via Listmonk + Scaleway TEM email
  • Travel time estimates — Valhalla routing (car/bike/walk from Vejle to venues)
  • Admin dashboard — Event moderation, re-scraping, translation review, change tracking
  • GDPR compliance — PII stripping, privacy policy, data removal requests

Architecture

Tech Stack

LayerTechnology
MonorepoTurborepo + pnpm workspaces
FrontendNext.js 15 (App Router) + TypeScript + Tailwind CSS v4
DatabasePostgreSQL 17 + Drizzle ORM
QueueBullMQ + Redis
VPSHetzner CX22 (EUR3.79/mo)
AIOllama (Qwen3 32B) on M4 Mac Mini
NewsletterListmonk (self-hosted Go) + Scaleway TEM SMTP
RoutingValhalla (memory-mapped tiles, Mac Mini)
ProxyCaddy v2 (auto HTTPS)
CI/CDGitHub Actions + Watchtower

Hybrid VPS + Mac Mini Architecture

M4 Mac Mini (local, 64GB RAM)     Hetzner CX22 VPS (EUR3.79/mo)
  Ollama (Qwen3 32B)               Next.js frontend + API routes
  Valhalla routing engine           PostgreSQL + Drizzle
  Nightly batch (~10-30 min)        BullMQ + Redis
  Connected via WireGuard           Listmonk + Scaleway TEM
                                    Caddy (reverse proxy)

Nightly Pipeline

  1. 3:00 AM: BullMQ triggers scrapers for 20+ sources -> Zod validation -> dedup -> PII stripping -> PostgreSQL
  2. 3:30 AM: Mac Mini pulls untranslated events -> Ollama translates, categorizes, summarizes, enriches -> pushes back
  3. 4:00 AM: Geocoding (Nominatim) + travel times (Valhalla car/bike/walk)
  4. Thursday 10 PM: Newsletter generation -> Listmonk campaign -> Scaleway TEM delivery

Key Patterns

  • Browser pool — Single Chromium, multiple BrowserContexts (cuts memory from N200MB to 200MB + N20MB)
  • Scraper families — BaseScraper abstract class; KultuNaut and DPL CMS adapters handle multiple sites
  • Dynamic scraper registry — Sources from DB, not hardcoded; add sources without code changes
  • Graceful degradation — Mac offline -> VPS serves cached translations; DeepL fallback after 48h
  • PII stripping at ingestion — Removes personal emails and +45 phone numbers; preserves org contacts

Development History

425 commits, 281 test files, 17+ phases completed:

PhaseFocus
1Foundation (monorepo, DB, Docker, CI/CD)
2Data pipeline (scrapers + orchestration)
4Frontend (event browsing, filtering, search)
6Admin + reliability (moderation, graceful degradation)
8Testing (Vitest, Playwright, CI)
11Legal + transparency (GDPR, PII stripping, about page)
12Scraper expansion (KultuNaut, DPL CMS adapter, dedup)
13Venue scrapers (8 specialized scrapers)
14Travel time + Valhalla routing
15Event re-scrape + change tracking
16Description enrichment
17Timing corrections

Operational: Live at govejle.com with 2,000+ events, ~50 newsletter subscribers.

Strengths

  • Cost discipline — EUR7-10/month achieved (Ollama local, self-hosted everything)
  • Graceful degradation — VPS always serves something useful; multiple fallback layers
  • PII stripping at ingestion — Conservative GDPR approach
  • Browser pool pattern — Memory-efficient scraping
  • Comprehensive documentation — SCRAPER-PLAYBOOK.md, DEPLOYMENT.md, RUNBOOK.md
  • Health monitoring — Scraper staleness alerts, API response time, error tracking
  • Change tracking — Field-level before/after diffs for every event modification

Weaknesses & Risks

  • Scraper maintenance scales superlinearly — 8 custom Playwright scrapers are brittle to site redesigns
  • PostgreSQL connection pool not configured — Scraper and web pools may compete
  • Scraper etiquette still to formalize — Rate limiter exists but full User-Agent / robots.txt compliance is on the to-do list
  • Image storage growth — No per-source quality threshold
  • Newsletter segment names hardcoded — Will break if Listmonk lists renamed
  • Public endpoint hardening in progress — Abuse protection and per-IP limits are in the operational backlog

Connection to Other Projects

  • Roughneck — govejle-translation, enrichment, newsletter, scheduler plugins handle all async work
  • CNC — Monitored app; sends heartbeats, errors, logs
  • TuringPi — Could potentially host GoVejle infrastructure components