30+ Playwright tools wired into the Opta agent loop. Policy-gated, artifact-capturing, peekaboo-visible, and sub-agent-delegatable.
browser_navigate browser_go_backbrowser_go_forward browser_snapshotbrowser_screenshot browser_network_requestsbrowser_console_messages browser_wait_forbrowser_tabs browser_tab_new
browser_click browser_typebrowser_fill_form browser_select_optionbrowser_press_key browser_dragbrowser_hover browser_scrollbrowser_handle_dialog browser_resize
browser_evaluate browser_run_codebrowser_file_uploadsrc/browser/live-host.ts manages a pool of Playwright Chromium sessions. Each slot has its own port, session ID, and current URL. When a tool call needs a browser, it claims a slot from the pool. The shared browser runtime daemon in runtime-daemon.ts keeps the Playwright process alive between tool invocations.
# Live host configuration portRange: 46000–47000 # random port selection maxSessionSlots: 5 # concurrent browser sessions requiredPortCount: 6 # ports per slot (main + util) bindHost: 127.0.0.1 # loopback only — never exposed # Session slot status { slotIndex: 0, port: 46031, sessionId: "sess_01J5…", currentUrl: "https://docs.python.org/asyncio", updatedAt: "2026-03-06T…" } # Enable browser tools in config opta config set browser.mcp.enabled true
127.0.0.1. The LOOPBACK_HOSTS set validates that no session is ever exposed to external network interfaces — important when the user is on a shared network.
src/browser/mcp-interceptor.ts wraps the @playwright/mcp server. It intercepts every tool call, evaluates it with the policy engine, records the action to the artifact log, and then either executes or denies. This design means policy can be changed without touching Playwright internals.
# Interceptor pipeline per tool call tool_call = { name: "browser_click", params: { selector: "#submit" } } 1. evaluateBrowserPolicyAction(tool_call) → { decision: "allow" | "ask" | "deny", reason } 2. if "ask": pause for user confirmation if "deny": return error immediately 3. artifacts.record(tool_call, timestamp) 4. playwright.executeTool(tool_call) → { result, screenshot? } 5. artifacts.record(result) 6. return result to agent loop
The policy engine in src/browser/policy-engine.ts evaluates each browser tool call. Navigation tools check the target URL against a domain allowlist. Interaction tools (click, type) check the session's current domain. High-risk tools (evaluate, file_upload) require explicit config opt-in.
# Policy config opta config set browser.policy.allowedDomains "github.com,docs.python.org" opta config set browser.policy.allowEvaluate false # default # Policy evaluation logic function evaluateBrowserPolicyAction(call) { if (call.name === "browser_evaluate") { return config.browser.policy.allowEvaluate ? "ask" : "deny" } if (isNavigationTool(call.name)) { return isAllowedDomain(call.params.url) ? "allow" : "ask" } return TIER_MAP[call.name] // allow | ask | deny }
Math.floor (not round), so fractional levels always resolve to the stricter tier. This prevents accidental permission escalation from floating-point autonomy scores.
# Wait tools browser_wait_for { selector?: string, # CSS selector to appear text?: string, # text content to appear event?: "load" | "networkidle" | "domcontentloaded", timeout?: number # ms, default 30000 } # Pattern: navigate then wait for content → browser_navigate https://app.example.com/dashboard → browser_wait_for { event: "networkidle" } → browser_wait_for { selector: ".data-table" } → browser_snapshot ← now safe to read DOM
# Click browser_click { selector: string } → browser_click { selector: "button[type=submit]" } → browser_click { selector: "text=Sign In" } # Type browser_type { selector: string, text: string } → browser_type { selector: "#email", text: "[email protected]" } # Hover (for tooltips / menus) browser_hover { selector: string } # Scroll browser_scroll { selector?: string, direction: "up"|"down", distance?: number }
browser_snapshot shows accessible names, use those for more resilient selectors than fragile CSS classes.
# Fill multiple fields atomically browser_fill_form { fields: [ { selector: "#name", value: "Matthew Byrden" }, { selector: "#email", value: "[email protected]" }, { selector: "#company", value: "Opta Operations" } ] } # Select dropdown option browser_select_option { selector: "#country", value: "AU" # or label: "Australia" } # Pattern: scrape form then fill → browser_snapshot ← read form fields + selectors → browser_fill_form { fields: […] } → browser_click { selector: "[type=submit]" }
# Key press (single or chord) browser_press_key { key: "Enter" } browser_press_key { key: "Control+A" } browser_press_key { key: "Escape" } browser_press_key { key: "Tab" } # Drag and drop browser_drag { startSelector: ".draggable-item", endSelector: ".drop-zone" } # Use cases → Ctrl+A then Delete to clear a field → Tab navigation through a form → Escape to close modals → Drag list items to reorder
# Handle next dialog before triggering action browser_handle_dialog { action: "accept" | "dismiss", promptText?: string # for window.prompt() } # Pattern: handle delete confirmation → browser_handle_dialog { action: "accept" } → browser_click { selector: "#delete-btn" } # Dialog intercepted and accepted automatically # Pattern: dismiss unsaved changes dialog → browser_handle_dialog { action: "dismiss" } → browser_navigate { url: "https://app.example.com" }
# Basic screenshot browser_screenshot {} → { data: "data:image/png;base64,…", mimeType: "image/png" } # Full-page screenshot (scrolls entire page) browser_screenshot { fullPage: true } # Screenshot is passed to vision model # Agent can read text, see layout, understand UI # Peekaboo alternative (macOS only) # capturePeekabooScreenPng() — captures from # Chromium window buffer regardless of focus # 500ms frame cache to avoid redundant captures
# DOM snapshot browser_snapshot {} → accessibility tree (ARIA roles + names + selectors) # Example output (excerpt) button "Sign In" [selector: button[type=submit]] textbox "Email" [selector: #email] link "Forgot password?" [selector: a.forgot-link] heading "Welcome back" [level: 1] # When to use snapshot vs screenshot: browser_snapshot ← find selectors, read form fields browser_screenshot ← understand visual layout, read images
# Get all network requests since page load browser_network_requests {} → [ { method: "GET", url: "https://api.example.com/user", status: 200, responseBodySize: 1240 }, { method: "POST", url: "https://api.example.com/session", status: 201, headers: { authorization: "Bearer …" } } ] # Use cases: → Find the API endpoint a page calls for its data → Inspect auth headers to understand auth flow → Debug why a page isn't loading data → Discover undocumented internal APIs
# Enable in config first opta config set browser.policy.allowEvaluate true # Execute JS in page context browser_evaluate { js: "document.title" } → "GitHub - opta/cli" browser_evaluate { js: "window.__APP_STATE__" } → { userId: "user_01…", featureFlags: […] } browser_evaluate { js: "Array.from(document.querySelectorAll('h2')).map(e=>e.textContent)" } → ["Setup", "Installation", "Configuration", …] # Also requires confirmation (policy-gated)
browser_evaluate executes in the page's JS context and can access cookies, localStorage, and page state. Only enable on pages you trust. The policy engine denies by default — enable deliberately per-session.
Peekaboo uses macOS screen capture APIs to grab the Chromium window buffer by app name (PLAYWRIGHT_BROWSER_APP_NAME = "Chromium"). The frame is cached for 500ms to avoid redundant captures. Sensitive text is redacted from the log output before display.
# Peekaboo functions (src/browser/peekaboo.ts) capturePeekabooScreenPng() → PNG buffer (500ms cache) isPeekabooAvailable() → bool (macOS only) peekabooClickLabel(label) → click by accessible label peekabooPressKey(key) → synthetic key event peekabooTypeText(text) → type into active field redactPeekabooSensitiveText(t) → mask passwords/tokens # TUI sidebar shows live browser preview # No focus change needed — works behind other windows # Enabled automatically when browser.mcp.enabled=true (macOS)
browser_screenshot for visual feedback instead.
# Visual diff pipeline (src/browser/visual-diff.ts) # Automatic: MCP interceptor captures before/after before_screenshot = capture() ← before action execute_tool(action) after_screenshot = capture() ← after action diff = pixelDiff(before, after) → { changed: true, changedRegions: [{ x, y, w, h }] } # Diff result available in tool result payload tool_result.visualDiff = { changedPixelPercent: 0.12, significantChange: true } # Agent uses diff to verify actions had expected effect
delegateToBrowserSubAgent() in src/browser/sub-agent-delegator.ts spawns a complete child agent with browser-focused system prompt and tool permissions. The parent agent continues once the sub-agent completes or hits a defined checkpoint. Ideal for goals like "go to this site and fill out the form".
# Sub-agent delegation opta do "go to linear.app and create an issue for the bug in #9234" Parent agent: → delegateToBrowserSubAgent({ goal: "create Linear issue for #9234" }) Browser sub-agent (full peer): → browser_navigate https://linear.app → browser_click { selector: "New Issue" } → browser_fill_form { fields: [{ title: "Bug: …" }] } → browser_click { selector: "Create Issue" } → return { issueId: "ENG-421", url: "https://…" } Parent agent resumes with result.
src/browser/replay.ts reads the artifact action log from a previous session and re-executes each browser action in order. Actions are timed to match original delays. Screenshots are captured at each step for side-by-side comparison with the original.
# Replay a recorded session opta browser replay --session sess_01J5X3… ◆ Loading action log (42 actions)… → browser_navigate https://app.example.com ✓ → browser_fill_form { email, password } ✓ → browser_click { selector: "#submit" } ✓ → browser_wait_for { event: "networkidle" } ✓ … ✓ Replay complete. 42/42 actions succeeded. Visual diff: 3 regions changed vs original. # Action log stored in session artifact dir # ~/.config/opta/sessions/sess_01J5…/artifacts/