Snapshots

The snapshot command returns a compact accessibility tree with refs for element interaction.

Options

Filter output to reduce size:

agent-browser snapshot                    # Full accessibility tree
agent-browser snapshot -i                 # Interactive elements only (recommended)
agent-browser snapshot -i -C              # Include cursor-interactive elements
agent-browser snapshot -c                 # Compact (remove empty elements)
agent-browser snapshot -d 3               # Limit depth to 3 levels
agent-browser snapshot -s "#main"         # Scope to CSS selector
agent-browser snapshot -i -c -d 5         # Combine options
OptionDescription
-i, --interactiveOnly interactive elements (buttons, links, inputs)
-C, --cursorInclude cursor-interactive elements (cursor:pointer, onclick, tabindex)
-c, --compactRemove empty structural elements
-d, --depthLimit tree depth
-s, --selectorScope to CSS selector

Cursor-interactive elements

Many modern web apps use custom clickable elements (divs, spans) instead of standard buttons or links. The -C flag detects these by looking for:

  • cursor: pointer CSS style
  • onclick attribute or handler
  • tabindex attribute (keyboard focusable)
agent-browser snapshot -i -C
# Output includes:
# @e1 [button] "Submit"
# @e2 [link] "Learn more"
# Cursor-interactive elements:
# @e3 [clickable] "Menu Item" [cursor:pointer, onclick]
# @e4 [clickable] "Card" [cursor:pointer]

Output format

The default text output is compact and AI-friendly:

agent-browser snapshot -i
# Output:
# @e1 [heading] "Example Domain" [level=1]
# @e2 [button] "Submit"
# @e3 [input type="email"] placeholder="Email"
# @e4 [link] "Learn more"

Using refs

Refs from the snapshot map directly to commands:

agent-browser click @e2      # Click the Submit button
agent-browser fill @e3 "a@b.com"  # Fill the email input
agent-browser get text @e1        # Get heading text

Ref lifecycle

Refs are invalidated when the page changes. Always re-snapshot after navigation or DOM updates:

agent-browser click @e4      # Navigates to new page
agent-browser snapshot -i    # Get fresh refs
agent-browser click @e1      # Use new refs

Annotated screenshots

For visual context alongside text snapshots, use screenshot --annotate to overlay numbered labels on interactive elements. Each label [N] maps to ref @eN:

In native mode, annotated screenshots currently work on the CDP-backed browser path (Chromium/Lightpanda). The Safari/WebDriver backend does not yet support --annotate.

agent-browser screenshot --annotate ./page.png
# -> Screenshot saved to ./page.png
#    [1] @e1 button "Submit"
#    [2] @e2 link "Home"
#    [3] @e3 textbox "Email"
agent-browser click @e2

Annotated screenshots also cache refs, so you can interact with elements immediately. This is useful when the text snapshot is insufficient -- unlabeled icons, canvas content, or visual layout verification.

Iframes

Snapshots automatically detect and inline iframe content. Each Iframe node in the main frame is resolved and its child accessibility tree is included directly beneath it. Refs assigned to elements inside iframes carry frame context, so interactions work without switching frames first.

agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
#   @e3 [input] "Card number"
#   @e4 [button] "Pay"

agent-browser fill @e3 "4111111111111111"
agent-browser click @e4

Only one level of iframe nesting is expanded. Cross-origin iframes that block accessibility tree access and empty iframes are silently omitted.

To scope a snapshot to a single iframe, switch into it first:

agent-browser frame @e2
agent-browser snapshot -i       # Only elements inside that iframe
agent-browser frame main        # Return to main frame

Best practices

  1. Use -i to reduce output to actionable elements
  2. Re-snapshot after page changes to get updated refs
  3. Scope with -s for specific page sections
  4. Use -d to limit depth on complex pages
  5. Use screenshot --annotate when visual context is needed alongside refs

JSON output

For programmatic parsing in scripts:

agent-browser snapshot --json
# {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

Note: JSON uses more tokens than text output. The default text format is preferred for AI agents.