andrewlb notes

Shotput

Shotput

Tools

Claude CodeTypeScriptPlaywrightMCP SDKZodtsupVitestNode.js

What worked

3 of 5 phases complete in ~33 minutes total (2.9 min/plan average). Claude Code built the MCP server with 6 coarse-grained intent-based tools (not thin Playwright wrappers) — reducing MCP round-trips for agent workflows. Lazy browser initialization (Chromium not launched until first screenshot) cut startup from 1-3s to ~100ms. DOM inspection returns a <5KB curated summary instead of 100KB+ raw HTML, which means Claude can reason about selectors without blowing context. The natural-language element targeting flow (describe what to capture → inspect → get selector → capture) works well because the inspector was designed for agent consumption from the start.

What broke

SSRF hardening is incomplete and is the single most important thing to close before Shotput is wired to any untrusted input path — URL allow/deny-list handling is on the hardening backlog. No dark/light mode emulation yet (deferred to Phase 4). No device presets. No batch capture — one call per URL currently. Process-lifecycle tracking around browser.close() is fragile. Test coverage is incomplete — QUAL-01 deferred to Phase 5.

Roles

I set the MCP-first design — this is a tool built *for* agents, not a general Playwright wrapper, and the tool shape reflects that (coarse-grained intents, curated DOM summaries, graceful wait degradation). Claude Code wrote every line of Playwright and MCP SDK code. This is a particularly satisfying vibe because Shotput is the tool I use to capture screenshots for every other vibe in this portfolio — self-dogfooding from the start. The fresh-BrowserContext-per-capture decision was mine for state-leakage reasons.

Shotput (MCP Screenshot Capture Tool)

Overview

Shotput is a headless browser screenshot capture tool built as an MCP (Model Context Protocol) server that integrates with Claude Code. It enables programmatic capture of publication-ready screenshots — full-page or element-specific — entirely locally with zero external service dependencies.

Target users: Claude Code users needing automated screenshot capture for documentation, developers building docs, content creators capturing UI screenshots.

Key Features

  • Full-page and viewport screenshots in PNG/JPEG with quality control
  • Element-targeted screenshots via CSS selectors with configurable padding
  • Natural language element targeting — describe what to capture; Claude identifies the CSS selector
  • Page preparation — inject CSS/JavaScript, hide elements before capture
  • Authentication — manual login via visible browser OR programmatic cookie/token injection
  • Device emulation — custom viewport dimensions, scale factors (1x-3x retina)
  • Lazy content triggering — auto-scroll to load lazy-loaded images
  • Flexible wait strategies — networkidle, domcontentloaded, load, or custom delay

Architecture

Tech Stack

LayerTechnology
Browser AutomationPlaywright 1.58.2
MCP Server@modelcontextprotocol/sdk 1.27.1 (stdio transport)
LanguageTypeScript 5.x
ValidationZod 3.25.0
RuntimeNode.js 22 LTS
Buildtsup 8.x
TestingVitest + Playwright Test

Structure

src/
  index.ts    # Entry point, creates MCP server
  server.ts   # MCP tool registration (6 tools)
  browser.ts  # Browser manager (singleton, lazy initialization)
  capture.ts  # Screenshot capture pipeline
  inspect.ts  # DOM inspection + accessibility tree extraction
  auth.ts     # Session manager for authenticated captures
  output.ts   # File naming and output path resolution
  scroll.ts   # Auto-scroll for lazy content
  types.ts    # Shared TypeScript interfaces

Key Design Decisions

  • 6 coarse-grained intent-based tools (not thin API wrappers) — reduces MCP round-trips
  • Lazy browser initialization — Chromium not launched until first screenshot (~100ms startup vs 1-3s)
  • DOM summary not raw HTML — Curated data (<5KB vs 100KB+) for Claude to reason about selectors
  • Fresh BrowserContext per capture — No state leakage between captures
  • Graceful wait degradation — Try networkidle first, fall back to domcontentloaded + delay

MCP Tools

  1. shotput_capture — Full-page/viewport/element screenshot
  2. shotput_inspect — DOM summary + accessibility tree for selector identification
  3. shotput_set_cookies — Programmatic cookie injection
  4. shotput_clear_sessions — Clear all stored sessions
  5. shotput_login — Interactive login via visible browser
  6. shotput_list_sessions — List stored sessions

Development History

3 of 5 phases complete (executed in ~33 minutes):

PhaseStatusFocus
1CompleteCore capture engine (browser manager, pipeline, output)
2CompleteElement targeting (CSS selectors, padding, DOM inspection)
3CompleteAuthentication (session manager, cookie injection, interactive login)
4PendingSkill layer + display polish (dark/light mode, device presets, batch)
5PendingCross-client compatibility + quality (opencode, tests, docs)

Average velocity: 2.9 min/plan (7 plans in 33 minutes total).

Strengths

  • Context isolation — Fresh BrowserContext per capture; no state leakage
  • Graceful degradation — Timeout doesn't crash, missing elements don't hang
  • Explicit lifecycle management — Signal handlers + browser cleanup
  • DOM inspection design — Aria snapshot + curated summary vs raw HTML
  • Session security — No credential logging, fresh contexts, periodic StorageState capture

Weaknesses & Risks

  • SSRF hardening incomplete — URL allow/deny-list handling is on the hardening backlog and must be closed before Shotput is wired to any untrusted input path
  • No dark/light mode emulation — Deferred to Phase 4
  • No device presets — Users must manually set viewport/scale
  • No batch capture — One call per URL currently
  • Process lifecycle around browser.close() is fragile — Cleanup edge cases need tightening
  • Test coverage incomplete — QUAL-01 (full test suite) deferred to Phase 5

Connection to Other Projects

  • PM Toolkit — Could use Shotput for automated export/screenshot generation
  • 2024.garden — Screenshot documentation of the digital garden