
Shotput
Tools
What worked
3 of 5 phases complete in ~33 minutes total (2.9 min/plan average). Claude Code built the MCP server with 6 coarse-grained intent-based tools (not thin Playwright wrappers) — reducing MCP round-trips for agent workflows. Lazy browser initialization (Chromium not launched until first screenshot) cut startup from 1-3s to ~100ms. DOM inspection returns a <5KB curated summary instead of 100KB+ raw HTML, which means Claude can reason about selectors without blowing context. The natural-language element targeting flow (describe what to capture → inspect → get selector → capture) works well because the inspector was designed for agent consumption from the start.
What broke
SSRF hardening is incomplete and is the single most important thing to close before Shotput is wired to any untrusted input path — URL allow/deny-list handling is on the hardening backlog. No dark/light mode emulation yet (deferred to Phase 4). No device presets. No batch capture — one call per URL currently. Process-lifecycle tracking around browser.close() is fragile. Test coverage is incomplete — QUAL-01 deferred to Phase 5.
Roles
I set the MCP-first design — this is a tool built *for* agents, not a general Playwright wrapper, and the tool shape reflects that (coarse-grained intents, curated DOM summaries, graceful wait degradation). Claude Code wrote every line of Playwright and MCP SDK code. This is a particularly satisfying vibe because Shotput is the tool I use to capture screenshots for every other vibe in this portfolio — self-dogfooding from the start. The fresh-BrowserContext-per-capture decision was mine for state-leakage reasons.
Shotput (MCP Screenshot Capture Tool)
Overview
Shotput is a headless browser screenshot capture tool built as an MCP (Model Context Protocol) server that integrates with Claude Code. It enables programmatic capture of publication-ready screenshots — full-page or element-specific — entirely locally with zero external service dependencies.
Target users: Claude Code users needing automated screenshot capture for documentation, developers building docs, content creators capturing UI screenshots.
Key Features
- Full-page and viewport screenshots in PNG/JPEG with quality control
- Element-targeted screenshots via CSS selectors with configurable padding
- Natural language element targeting — describe what to capture; Claude identifies the CSS selector
- Page preparation — inject CSS/JavaScript, hide elements before capture
- Authentication — manual login via visible browser OR programmatic cookie/token injection
- Device emulation — custom viewport dimensions, scale factors (1x-3x retina)
- Lazy content triggering — auto-scroll to load lazy-loaded images
- Flexible wait strategies — networkidle, domcontentloaded, load, or custom delay
Architecture
Tech Stack
| Layer | Technology |
|---|---|
| Browser Automation | Playwright 1.58.2 |
| MCP Server | @modelcontextprotocol/sdk 1.27.1 (stdio transport) |
| Language | TypeScript 5.x |
| Validation | Zod 3.25.0 |
| Runtime | Node.js 22 LTS |
| Build | tsup 8.x |
| Testing | Vitest + Playwright Test |
Structure
src/
index.ts # Entry point, creates MCP server
server.ts # MCP tool registration (6 tools)
browser.ts # Browser manager (singleton, lazy initialization)
capture.ts # Screenshot capture pipeline
inspect.ts # DOM inspection + accessibility tree extraction
auth.ts # Session manager for authenticated captures
output.ts # File naming and output path resolution
scroll.ts # Auto-scroll for lazy content
types.ts # Shared TypeScript interfaces
Key Design Decisions
- 6 coarse-grained intent-based tools (not thin API wrappers) — reduces MCP round-trips
- Lazy browser initialization — Chromium not launched until first screenshot (~100ms startup vs 1-3s)
- DOM summary not raw HTML — Curated data (<5KB vs 100KB+) for Claude to reason about selectors
- Fresh BrowserContext per capture — No state leakage between captures
- Graceful wait degradation — Try networkidle first, fall back to domcontentloaded + delay
MCP Tools
shotput_capture— Full-page/viewport/element screenshotshotput_inspect— DOM summary + accessibility tree for selector identificationshotput_set_cookies— Programmatic cookie injectionshotput_clear_sessions— Clear all stored sessionsshotput_login— Interactive login via visible browsershotput_list_sessions— List stored sessions
Development History
3 of 5 phases complete (executed in ~33 minutes):
| Phase | Status | Focus |
|---|---|---|
| 1 | Complete | Core capture engine (browser manager, pipeline, output) |
| 2 | Complete | Element targeting (CSS selectors, padding, DOM inspection) |
| 3 | Complete | Authentication (session manager, cookie injection, interactive login) |
| 4 | Pending | Skill layer + display polish (dark/light mode, device presets, batch) |
| 5 | Pending | Cross-client compatibility + quality (opencode, tests, docs) |
Average velocity: 2.9 min/plan (7 plans in 33 minutes total).
Strengths
- Context isolation — Fresh BrowserContext per capture; no state leakage
- Graceful degradation — Timeout doesn't crash, missing elements don't hang
- Explicit lifecycle management — Signal handlers + browser cleanup
- DOM inspection design — Aria snapshot + curated summary vs raw HTML
- Session security — No credential logging, fresh contexts, periodic StorageState capture
Weaknesses & Risks
- SSRF hardening incomplete — URL allow/deny-list handling is on the hardening backlog and must be closed before Shotput is wired to any untrusted input path
- No dark/light mode emulation — Deferred to Phase 4
- No device presets — Users must manually set viewport/scale
- No batch capture — One call per URL currently
- Process lifecycle around browser.close() is fragile — Cleanup edge cases need tightening
- Test coverage incomplete — QUAL-01 (full test suite) deferred to Phase 5
Connection to Other Projects
- PM Toolkit — Could use Shotput for automated export/screenshot generation
- 2024.garden — Screenshot documentation of the digital garden