Quick Start
Core workflow
Every browser automation follows this pattern:
# 1. Navigate
agent-browser open example.com
# 2. Snapshot to get element refs
agent-browser snapshot -i
# Output:
# @e1 [heading] "Example Domain"
# @e2 [link] "More information..."
# 3. Interact using refs
agent-browser click @e2
# 4. Re-snapshot after page changes
agent-browser snapshot -iCommon commands
agent-browser open example.com
agent-browser snapshot -i # Get interactive elements with refs
agent-browser click @e2 # Click by ref
agent-browser fill @e3 "test@example.com" # Fill input by ref
agent-browser get text @e1 # Get text content
agent-browser screenshot # Save to temp directory
agent-browser screenshot page.png # Save to specific path
agent-browser closeTraditional selectors
CSS selectors and semantic locators also supported:
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"Headed mode
Show browser window for debugging:
agent-browser open example.com --headedWait for content
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/dashboard" # Wait for URL pattern
agent-browser wait 2000 # Wait millisecondsJSON output
For programmatic parsing in scripts:
agent-browser snapshot --json
agent-browser get text @e1 --jsonNote: The default text output is more compact and preferred for AI agents.