
GoVejle
Tools
What worked
425 commits, 281 test files, 17+ phases — all running at EUR7-10/month total cost because everything that can be local IS local (Ollama on Mac Mini, Valhalla routing, Listmonk self-hosted). The hybrid VPS + Mac Mini architecture via WireGuard worked as designed: nightly batch 3:00 AM scraper → 3:30 AM translation → 4:00 AM geocoding/travel times → Thursday 10 PM newsletter. Claude Code produced the browser pool pattern (single Chromium, multiple BrowserContexts) that cut memory from N*200MB to 200MB + N*20MB. The dynamic scraper registry (sources from DB) made adding sources a config change, not a code change.
What broke
8 custom Playwright scrapers are brittle to site redesigns — scraper maintenance scales superlinearly and I'm the only one maintaining them. PII stripping at ingestion is conservative but I haven't formally audited it against real-world false negatives. Scraper etiquette (User-Agent, robots.txt) beyond simple rate limiting is still on the to-do list. Public endpoint hardening — rate limiting, abuse protection — is in the operational backlog. Newsletter segment names are hardcoded — will break if Listmonk lists are renamed.
Roles
I set the legal structure (Danish frivillig forening), the target user (English speakers in Triangle Region ~1-3K people), and the editorial decisions about what counts as an event worth including. Claude Code wrote the scrapers, the Ollama translation pipeline, the BullMQ orchestration, and the Ansible-free provisioning. The decision to route everything Danish-language through Ollama locally (instead of DeepL as primary) was mine and cost-driven.
GoVejle (Event Discovery Platform)
Overview
GoVejle is an English-language event discovery platform for expats and English speakers in Denmark's Triangle Region (Vejle, Billund, Kolding, Fredericia, Give, Jelling). It scrapes 20+ Danish sources, translates and enriches events with AI, and presents them through a web interface and weekly newsletter.
Core question answered: "What can we do today?"
Target users: English-speaking residents, international families, organized expat groups (estimated 1,000-3,000 people). Registered as a Danish frivillig forening (voluntary association).
Key Features
- Event aggregation — Scrapes 20+ sources (KultuNaut, municipality sites, cultural venues, libraries)
- AI translation — Danish->English via local Ollama (Qwen3 32B)
- Smart categorization — AI-driven tagging (indoor/outdoor, family-friendly, free/paid, age ranges)
- Web discovery — Browse, filter by date/category, search, rich event details with images
- Weekly newsletter — AI-generated summaries via Listmonk + Scaleway TEM email
- Travel time estimates — Valhalla routing (car/bike/walk from Vejle to venues)
- Admin dashboard — Event moderation, re-scraping, translation review, change tracking
- GDPR compliance — PII stripping, privacy policy, data removal requests
Architecture
Tech Stack
| Layer | Technology |
|---|---|
| Monorepo | Turborepo + pnpm workspaces |
| Frontend | Next.js 15 (App Router) + TypeScript + Tailwind CSS v4 |
| Database | PostgreSQL 17 + Drizzle ORM |
| Queue | BullMQ + Redis |
| VPS | Hetzner CX22 (EUR3.79/mo) |
| AI | Ollama (Qwen3 32B) on M4 Mac Mini |
| Newsletter | Listmonk (self-hosted Go) + Scaleway TEM SMTP |
| Routing | Valhalla (memory-mapped tiles, Mac Mini) |
| Proxy | Caddy v2 (auto HTTPS) |
| CI/CD | GitHub Actions + Watchtower |
Hybrid VPS + Mac Mini Architecture
M4 Mac Mini (local, 64GB RAM) Hetzner CX22 VPS (EUR3.79/mo)
Ollama (Qwen3 32B) Next.js frontend + API routes
Valhalla routing engine PostgreSQL + Drizzle
Nightly batch (~10-30 min) BullMQ + Redis
Connected via WireGuard Listmonk + Scaleway TEM
Caddy (reverse proxy)
Nightly Pipeline
- 3:00 AM: BullMQ triggers scrapers for 20+ sources -> Zod validation -> dedup -> PII stripping -> PostgreSQL
- 3:30 AM: Mac Mini pulls untranslated events -> Ollama translates, categorizes, summarizes, enriches -> pushes back
- 4:00 AM: Geocoding (Nominatim) + travel times (Valhalla car/bike/walk)
- Thursday 10 PM: Newsletter generation -> Listmonk campaign -> Scaleway TEM delivery
Key Patterns
- Browser pool — Single Chromium, multiple BrowserContexts (cuts memory from N200MB to 200MB + N20MB)
- Scraper families — BaseScraper abstract class; KultuNaut and DPL CMS adapters handle multiple sites
- Dynamic scraper registry — Sources from DB, not hardcoded; add sources without code changes
- Graceful degradation — Mac offline -> VPS serves cached translations; DeepL fallback after 48h
- PII stripping at ingestion — Removes personal emails and +45 phone numbers; preserves org contacts
Development History
425 commits, 281 test files, 17+ phases completed:
| Phase | Focus |
|---|---|
| 1 | Foundation (monorepo, DB, Docker, CI/CD) |
| 2 | Data pipeline (scrapers + orchestration) |
| 4 | Frontend (event browsing, filtering, search) |
| 6 | Admin + reliability (moderation, graceful degradation) |
| 8 | Testing (Vitest, Playwright, CI) |
| 11 | Legal + transparency (GDPR, PII stripping, about page) |
| 12 | Scraper expansion (KultuNaut, DPL CMS adapter, dedup) |
| 13 | Venue scrapers (8 specialized scrapers) |
| 14 | Travel time + Valhalla routing |
| 15 | Event re-scrape + change tracking |
| 16 | Description enrichment |
| 17 | Timing corrections |
Operational: Live at govejle.com with 2,000+ events, ~50 newsletter subscribers.
Strengths
- Cost discipline — EUR7-10/month achieved (Ollama local, self-hosted everything)
- Graceful degradation — VPS always serves something useful; multiple fallback layers
- PII stripping at ingestion — Conservative GDPR approach
- Browser pool pattern — Memory-efficient scraping
- Comprehensive documentation — SCRAPER-PLAYBOOK.md, DEPLOYMENT.md, RUNBOOK.md
- Health monitoring — Scraper staleness alerts, API response time, error tracking
- Change tracking — Field-level before/after diffs for every event modification
Weaknesses & Risks
- Scraper maintenance scales superlinearly — 8 custom Playwright scrapers are brittle to site redesigns
- PostgreSQL connection pool not configured — Scraper and web pools may compete
- Scraper etiquette still to formalize — Rate limiter exists but full User-Agent / robots.txt compliance is on the to-do list
- Image storage growth — No per-source quality threshold
- Newsletter segment names hardcoded — Will break if Listmonk lists renamed
- Public endpoint hardening in progress — Abuse protection and per-IP limits are in the operational backlog
Connection to Other Projects
- Roughneck — govejle-translation, enrichment, newsletter, scheduler plugins handle all async work
- CNC — Monitored app; sends heartbeats, errors, logs
- TuringPi — Could potentially host GoVejle infrastructure components