andrewlb notes

Shotput

Shotput

Tools

Claude CodeTypeScriptPlaywrightMCP SDKZodtsupVitestNode.js

What worked

The MCP-first design shaped everything: 6 coarse-grained intent-based tools (not thin Playwright wrappers) reduced round-trips for agent workflows. DOM inspection returns a curated <5KB summary instead of 100KB+ raw HTML, so Claude can reason about selectors without blowing context. Lazy browser initialization cut startup from 1-3s to ~100ms. The natural-language element targeting flow (describe what to capture, inspect, get selector, capture) works well because the inspector was designed for agent consumption from the start. Most satisfying: Shotput captures the screenshots for every other project writeup in this portfolio — self-dogfooding from day one.

What broke

SSRF hardening is incomplete and is the single most important thing to close before wiring Shotput to any untrusted input — URL allow/deny-list handling is on the hardening backlog. Process-lifecycle tracking around browser.close() is fragile. Test coverage is incomplete. No dark/light mode emulation, no device presets, no batch capture.

Roles

I set the MCP-first design constraint — this is a tool built for agents, not a general Playwright wrapper, and the tool shape reflects that. Claude Code wrote every line of Playwright and MCP SDK code. The fresh-BrowserContext-per-capture decision was mine for state-leakage prevention.

Shotput (MCP Screenshot Capture Tool)

Overview

Shotput is a headless browser screenshot capture tool built as an MCP (Model Context Protocol) server for Claude Code. It enables programmatic capture of full-page or element-specific screenshots entirely locally with zero external service dependencies.

Target users: Claude Code users needing automated screenshot capture for documentation, UI testing, or content creation.

What It Does

  • Full-page and element-targeted screenshots in PNG/JPEG with quality control and configurable padding
  • Natural language element targeting — describe what to capture; Claude identifies the CSS selector via curated DOM inspection
  • Page preparation — inject CSS/JavaScript, hide elements before capture
  • Authentication — manual login via visible browser or programmatic cookie/token injection
  • Device emulation — custom viewport dimensions, scale factors (1x-3x retina)
  • Lazy content handling — auto-scroll to trigger lazy-loaded images, flexible wait strategies (networkidle, domcontentloaded, custom delay)

How It Evolved

The core insight was designing for agents, not humans. A general Playwright wrapper would expose dozens of low-level methods; Shotput exposes 6 coarse-grained intent-based tools that match how an AI agent thinks about screenshots: capture a page, inspect its DOM, manage authentication sessions.

The DOM inspection tool was the most consequential design decision. Instead of returning raw HTML (100KB+, context-destroying), it returns a curated <5KB summary with an accessibility tree. This means Claude can reason about what to capture and pick selectors without a separate browsing step.

Lazy browser initialization was an iteration driven by usage patterns — most Claude Code sessions reference Shotput but only occasionally capture. Chromium launching eagerly on every session start was wasteful; now it starts on first use (~100ms instead of 1-3s).

The tool became self-referential almost immediately: Shotput captures the screenshots for every other project writeup in this portfolio.

Architecture Decisions

  • 6 coarse-grained MCP tools — Intent-based (capture, inspect, set cookies, clear sessions, login, list sessions) rather than thin API wrappers. Fewer round-trips for agent workflows.
  • DOM summary over raw HTML — Curated <5KB accessibility tree + structure summary. Agents can reason about selectors without blowing context windows.
  • Lazy browser initialization — Chromium not launched until first capture. Avoids startup cost in sessions that never screenshot.
  • Fresh BrowserContext per capture — No state leakage between captures. Slightly slower, much safer.
  • Graceful wait degradation — Try networkidle first, fall back to domcontentloaded + delay. Pages that never reach networkidle don't hang.
  • Zod validation at the boundary — All MCP tool inputs validated before reaching Playwright.

Weaknesses and Open Questions

  • SSRF hardening incomplete — URL allow/deny-list handling must be closed before any untrusted input path. This is the most important security gap.
  • Process lifecycle fragility — browser.close() cleanup has edge cases that need tightening.
  • Test coverage incomplete — Deferred; the tool works but isn't rigorously verified.
  • No dark/light mode emulation — Can't capture both variants of a page.
  • No batch capture — One call per URL currently; capturing many pages is slow.
  • No device presets — Users must manually set viewport and scale.

Ecosystem Role

Shotput is the portfolio's documentation tool — it captures screenshots for every other project writeup. It also demonstrates a design pattern (MCP tools shaped for agent consumption, not human usage) that could apply to other developer tools.