Snapshots
The snapshot command returns a compact accessibility tree with refs for element interaction.
Options
Filter output to reduce size:
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -i -C # Include cursor-interactive elements
agent-browser snapshot -c # Compact (remove empty elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options| Option | Description |
|---|---|
-i, --interactive | Only interactive elements (buttons, links, inputs) |
-C, --cursor | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
-c, --compact | Remove empty structural elements |
-d, --depth | Limit tree depth |
-s, --selector | Scope to CSS selector |
Cursor-interactive elements
Many modern web apps use custom clickable elements (divs, spans) instead of standard buttons or links.
The -C flag detects these by looking for:
cursor: pointerCSS styleonclickattribute or handlertabindexattribute (keyboard focusable)
agent-browser snapshot -i -C
# Output includes:
# @e1 [button] "Submit"
# @e2 [link] "Learn more"
# Cursor-interactive elements:
# @e3 [clickable] "Menu Item" [cursor:pointer, onclick]
# @e4 [clickable] "Card" [cursor:pointer]Output format
The default text output is compact and AI-friendly:
agent-browser snapshot -i
# Output:
# @e1 [heading] "Example Domain" [level=1]
# @e2 [button] "Submit"
# @e3 [input type="email"] placeholder="Email"
# @e4 [link] "Learn more"Using refs
Refs from the snapshot map directly to commands:
agent-browser click @e2 # Click the Submit button
agent-browser fill @e3 "a@b.com" # Fill the email input
agent-browser get text @e1 # Get heading textRef lifecycle
Refs are invalidated when the page changes. Always re-snapshot after navigation or DOM updates:
agent-browser click @e4 # Navigates to new page
agent-browser snapshot -i # Get fresh refs
agent-browser click @e1 # Use new refsAnnotated screenshots
For visual context alongside text snapshots, use screenshot --annotate to overlay numbered labels on interactive elements. Each label [N] maps to ref @eN:
In native mode, annotated screenshots currently work on the CDP-backed browser path (Chromium/Lightpanda). The Safari/WebDriver backend does not yet support --annotate.
agent-browser screenshot --annotate ./page.png
# -> Screenshot saved to ./page.png
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"
agent-browser click @e2Annotated screenshots also cache refs, so you can interact with elements immediately. This is useful when the text snapshot is insufficient -- unlabeled icons, canvas content, or visual layout verification.
Iframes
Snapshots automatically detect and inline iframe content. Each Iframe node in the main frame is resolved and its child accessibility tree is included directly beneath it. Refs assigned to elements inside iframes carry frame context, so interactions work without switching frames first.
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [button] "Pay"
agent-browser fill @e3 "4111111111111111"
agent-browser click @e4Only one level of iframe nesting is expanded. Cross-origin iframes that block accessibility tree access and empty iframes are silently omitted.
To scope a snapshot to a single iframe, switch into it first:
agent-browser frame @e2
agent-browser snapshot -i # Only elements inside that iframe
agent-browser frame main # Return to main frameBest practices
- Use
-ito reduce output to actionable elements - Re-snapshot after page changes to get updated refs
- Scope with
-sfor specific page sections - Use
-dto limit depth on complex pages - Use
screenshot --annotatewhen visual context is needed alongside refs
JSON output
For programmatic parsing in scripts:
agent-browser snapshot --json
# {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}Note: JSON uses more tokens than text output. The default text format is preferred for AI agents.