
GoVejle
Tools
What worked
The whole stack runs at EUR 7-10/month because everything that can be local IS local — Ollama on Mac Mini, Valhalla routing, self-hosted Listmonk. The browser pool pattern (single Chromium, multiple BrowserContexts) cut scraper memory from N*200MB to 200MB + N*20MB. The dynamic scraper registry (sources configured in DB, not code) means adding a new source is a config change. The nightly batch pipeline — scrape at 3 AM, translate at 3:30, geocode at 4:00, newsletter Thursday at 10 PM — has been reliable enough to sustain ~50 subscribers.
What broke
Custom Playwright scrapers are brittle to site redesigns and maintenance scales superlinearly — I'm the only one maintaining them. PII stripping is conservative but hasn't been formally audited against false negatives. Scraper etiquette (User-Agent, robots.txt compliance) is incomplete. Public endpoints lack rate limiting and abuse protection. Newsletter segment names are hardcoded to Listmonk list names. The real tension: this serves a genuine community need (~1-3K English speakers in the Triangle Region) but the scraper maintenance burden may not be sustainable solo.
Roles
I set the legal structure (Danish frivillig forening), defined the target user and editorial criteria for event inclusion, and made the cost-driven decision to route all translation through local Ollama instead of DeepL. Claude Code wrote the scrapers, the translation pipeline, the BullMQ orchestration, and the deployment configs.
GoVejle (Event Discovery Platform)
Overview
GoVejle is an English-language event discovery platform for expats and English speakers in Denmark's Triangle Region (Vejle, Billund, Kolding, Fredericia, Give, Jelling). It scrapes 20+ Danish sources, translates and enriches events with AI, and presents them through a web interface and weekly newsletter.
Core question answered: "What can we do today?"
Target users: English-speaking residents and international families in the Triangle Region (estimated 1,000-3,000 people). Registered as a Danish frivillig forening (voluntary association).
Operational: Live at govejle.com with 2,000+ events and ~50 newsletter subscribers.
What It Does
- Scrapes 20+ sources (KultuNaut, municipality sites, cultural venues, libraries) with deduplication and change tracking
- Translates Danish to English via local Ollama (Qwen3 32B), with AI-driven categorization (indoor/outdoor, family-friendly, free/paid, age ranges)
- Web discovery interface with date/category filtering, search, and rich event details with images
- Weekly newsletter with AI-generated summaries, delivered via self-hosted Listmonk + Scaleway TEM
- Travel time estimates from Vejle to venues via Valhalla routing (car/bike/walk)
- Admin dashboard for event moderation, re-scraping, translation review, and field-level change tracking
- GDPR compliance with PII stripping at ingestion, privacy policy, and data removal request handling
How It Fits Together
A hybrid architecture splits work between a Hetzner VPS (EUR 3.79/month) running Next.js, PostgreSQL, BullMQ/Redis, Listmonk, and Caddy, and an M4 Mac Mini (64GB RAM) running Ollama and Valhalla connected via WireGuard. A nightly batch pipeline runs scraping on the VPS, then ships untranslated events to the Mac Mini for AI translation, categorization, and geocoding. If the Mac Mini is offline, the VPS serves cached translations and falls back to DeepL after 48 hours.
Architecture Decisions
- Local AI over cloud APIs — All translation runs through Ollama on the Mac Mini. Cost-driven: this keeps the monthly bill under EUR 10 instead of scaling with event volume.
- Browser pool pattern — Single Chromium instance with multiple BrowserContexts instead of N browser instances. Memory drops from N200MB to 200MB + N20MB.
- Dynamic scraper registry — Sources are configured in the database, not hardcoded. Adding a new source is a config change, not a code change.
- Scraper family inheritance — BaseScraper abstract class with KultuNaut and DPL CMS adapters that each handle multiple sites. Reduces per-source maintenance.
- PII stripping at ingestion — Removes personal emails and phone numbers before storage. Conservative by design, but hasn't been formally audited.
- Graceful degradation — Mac Mini offline? VPS serves cached data. Translation fails? Event still appears in Danish. Newsletter generation fails? Events are still browsable on the web.
Iteration Story
The initial build established the scraping pipeline and web frontend. The interesting evolution was in scraper architecture: early scrapers were one-off scripts per source, but as the source count grew past 10, the maintenance burden became obvious. The scraper family pattern (BaseScraper with CMS-specific adapters) and dynamic registry emerged from that pain — they cut the per-source code from hundreds of lines to a database row plus a few overrides.
The travel time integration (Valhalla routing) was a late addition that changed how the newsletter feels. Events aren't just listed — they include "15 min by bike from Vejle centrum," which turns an event listing into something actionable for someone deciding whether to go.
The unresolved tension is sustainability: the platform serves a real community need, but 8 custom Playwright scrapers break whenever a source site redesigns, and there's only one maintainer.
Weaknesses & Open Questions
- Scraper brittleness — Custom Playwright scrapers break on site redesigns. Maintenance scales superlinearly with source count, and the maintainer pool is one person.
- Scraper etiquette incomplete — Rate limiting exists, but full User-Agent identification and robots.txt compliance are still on the backlog.
- Public endpoint hardening — No rate limiting or abuse protection on public-facing endpoints.
- PII audit gap — Stripping is conservative but hasn't been tested against real-world false negatives (e.g., names embedded in event descriptions).
- Newsletter coupling — Segment names are hardcoded to Listmonk list names. Renaming a list silently breaks delivery.
- PostgreSQL connection pool — Scraper and web pools may compete under load; not yet configured.
- Is this sustainable solo? — The community value is real, but the ongoing scraper maintenance may need either more contributors or a shift to source-provided APIs/feeds.
Ecosystem Role
GoVejle routes all async work through Roughneck's plugin system (translation, enrichment, newsletter, scheduling) and reports health to CNC for monitoring. The Mac Mini infrastructure is shared with Etyde's AI worker.