Skip to main content

Your Real Browser, Not a Sandbox

The Wolffish browser extension gives the agent direct control of your real browser. Unlike the Playwright-based browser capability (which runs an isolated headless session), the extension operates in your actual browser — your cookies, logins, extensions, and open tabs are all available.
The agent automatically prefers ext_* tools over Playwright browser_* tools when the extension is connected. No configuration needed.

Why This Exists

Computer use — where an AI takes screenshots of your screen, moves your mouse, and types keystrokes — works, but it’s slow, expensive, and fragile. Every action requires a full-screen screenshot sent to a vision model, the model guesses where to click based on pixel coordinates, and failures cascade because there’s no DOM awareness. A single browsing session can burn through hundreds of screenshots at vision-model pricing. The extension replaces all of that with direct browser control. No screenshots needed for navigation. No pixel guessing. The agent sends ext_click('.submit-btn') and the extension clicks the actual DOM element. It sends ext_read_page({ format: 'text' }) and gets clean text back — not a 2MB screenshot for the model to squint at. The real unlock is session reuse. Your browser is already logged into everything — Gmail, LinkedIn, Reddit, GitHub, Notion, your company’s internal tools. The extension gives the agent access to all of that without storing credentials, managing OAuth flows, or launching throwaway browser instances. The agent operates as you, in your browser, with your context.

What This Actually Looks Like

  • “Summarize my Reddit front page” — the agent opens your Reddit feed (you’re already logged in), reads the visible posts, scrolls to load more, and gives you a summary. No Reddit API key, no OAuth, no rate limits.
  • “Search Reddit for reviews of the M4 MacBook Air” — navigates to Reddit search, reads the top results, clicks into threads, extracts the useful comments. All through your logged-in session where you see upvoted content, not the stripped-down anonymous view.
  • “Clean up my LinkedIn — delete conversations I haven’t replied to in 6 months” — opens LinkedIn messaging, reads conversation previews, identifies stale ones, deletes them. Requires your authenticated session, which the extension already has.
  • “Accept all pending LinkedIn connection requests from people in my industry” — opens the invitations page, reads each request, checks the person’s headline, accepts the relevant ones. No LinkedIn API (which doesn’t even expose this).
  • “Check my GitHub notifications and close any resolved issues” — opens your GitHub notifications (logged in), reads each one, follows the link, checks if the issue was resolved, closes it. Faster than the GitHub API for this kind of triage because the UI state is already loaded.
  • “Go to Hacker News, find today’s top AI papers, and save the links to a markdown file” — no API exists for this. The extension just reads the page like you would.
  • “Fill out this job application form with my details” — reads the form fields, fills them from your profile, handles dropdowns and multi-step flows. In your actual browser where you’re already logged into the job portal.
These aren’t hypothetical — they’re the workflows the extension was built for. Anything you do in Chrome by clicking and reading, the agent can do through the extension.

Cost: Why Model Choice Matters

Browser automation generates large contexts fast. Every ext_read_page returns the full visible text. Every ext_screenshot adds an image to the context. A typical browsing session with 20–30 tool calls can push the context to 100K+ tokens. On premium models, this gets expensive quickly:
Model~30 tool calls~50 tool calls
Claude Opus 4.6~$25~$40+
GPT-4o~$8~$15
DeepSeek V3~$1.30~$2.50
MiniMax-M1~$1.50~$3
A single browsing session that looks routine to you — “check 5 job listings and compare them to my profile” — can involve 30+ tool calls, 10+ page reads, and 2+ screenshots. On Opus this is a 25turn.OnDeepSeekits25 turn. On DeepSeek it's 1.30 for the same result.
We recommend DeepSeek, MiniMax, or Kimi for browser-heavy workflows. They handle complex multi-step browsing reliably at a fraction of the cost. The quality difference for “navigate, read, extract, summarize” tasks is negligible — these models are excellent at structured data extraction and following multi-step instructions. Reserve premium models for tasks that genuinely need stronger reasoning, not for reading web pages.
You can set different models per conversation in Wolffish. Use DeepSeek for browsing sessions and Claude for complex reasoning tasks — you don’t have to choose one model for everything.

Architecture

The extension connects to the Wolffish app over a local WebSocket.
Agent → Plugin (ext_* tools) → WebSocket (localhost:23151) → Extension Service Worker → Chrome APIs / Content Script → Result
ComponentLocationRole
WebSocket Serverchannels/extension/server.ts (core)Connection management, heartbeat, command routing
Event Loggerchannels/extension/log.ts (core)Per-conversation event logging
Plugincerebellum/browser-extension/ (editable)Tool definitions, execute logic, screenshot processing
Extension~/.wolffish/workspace/extension/Chrome extension loaded in the browser

Setup

  1. Open Settings → Services → Browser Extension
  2. Click Reveal in Finder to find the extension folder
  3. Open Chrome or Brave → chrome://extensions
  4. Enable Developer ModeLoad Unpacked → select the extension folder
  5. The extension connects automatically — the status dot turns green
The extension auto-reconnects when the app restarts. You only need to load it once.

Auto-Updates

The extension updates itself automatically when Wolffish updates — no manual steps required.
  1. Each Wolffish release ships with the latest extension files bundled inside the app binary
  2. On every app launch, the bundled files are copied to ~/.wolffish/workspace/extension/, overwriting the previous version
  3. When the extension connects, the server compares its self-reported version against the version on disk
  4. If they differ, the server sends a reload command — the extension calls chrome.runtime.reload() on itself, picks up the new code, and reconnects automatically
The entire process is seamless. You’ll see a brief disconnect in the side panel (< 1 second) while the extension restarts, then it reconnects with the new version. You can also trigger a manual reload from Settings → Services → Browser Extension → Reload Extension.

49 Tools

The agent sees these as ext_* tools. The plugin translates them to browser_* commands over the wire. ext_navigate · ext_back · ext_forward · ext_reload

Page Interaction

ext_click · ext_type · ext_select · ext_hover · ext_scroll · ext_focus · ext_keypress · ext_drag_drop · ext_file_upload

Page Reading

ext_read_page · ext_query_selector · ext_get_attribute · ext_get_value · ext_get_url · ext_get_page_info

Tab & Window Management

ext_tabs_list · ext_tab_open · ext_tab_close · ext_tab_switch · ext_tab_duplicate · ext_tab_move · ext_windows_list · ext_window_open · ext_window_close · ext_window_resize

Capture

ext_screenshot · ext_pdf · ext_download

Data & Storage

ext_cookies_get · ext_cookies_set · ext_cookies_remove · ext_storage_get · ext_storage_set · ext_clipboard_read · ext_clipboard_write

Advanced

ext_execute_js · ext_wait_for · ext_wait_for_navigation · ext_wait_for_network_idle · ext_notify

Debugger Mode

ext_debugger_attach · ext_debugger_detach · ext_debugger_status

Mouse & Humanize

ext_mouse_move · ext_humanize

Debugger Mode

Debugger mode attaches Chrome DevTools Protocol (CDP) to a tab and replaces content-script-based interactions with low-level input events. When the debugger is attached, clicks, typing, scrolling, hovering, and keypresses are dispatched through CDP’s Input.dispatch* methods instead of DOM APIs.

Why It Exists

Content script interactions (element.click(), element.value = '...') are detectable. Sites can distinguish programmatic DOM events from real user input by checking event properties like isTrusted, monitoring event ordering, or using bot-detection libraries. CDP events bypass all of this — they enter the browser’s input pipeline at the same level as physical keyboard and mouse events.

How It Works

  1. The agent calls ext_debugger_attach with a tab ID
  2. Chrome attaches the debugger protocol to that tab (you’ll see a ”… is debugging this tab” banner)
  3. All subsequent ext_click, ext_type, ext_scroll, ext_hover, and ext_keypress calls on that tab are automatically routed through CDP instead of the content script
  4. Mouse movements follow Bezier curves with Gaussian-distributed timing — not straight lines with fixed delays
  5. Typing dispatches individual keyDown/char/keyUp sequences per character with variable inter-key delays
  6. When done, call ext_debugger_detach to release the debugger
Agent calls ext_click('.btn')

Extension checks: debugger attached?
  ├─ Yes → resolves element coordinates → Bezier mouse path → CDP mousePressed/mouseReleased
  └─ No  → content script → element.click()
The routing is automatic. The agent doesn’t need to change how it calls tools — it just attaches the debugger first and everything switches.

Limitations

  • Only one tab can have the debugger attached at a time — attaching to a new tab detaches from the previous one
  • Chrome shows a yellow “debugging” banner at the top of the page (cannot be hidden)
  • chrome:// and chrome-extension:// pages cannot be debugged
  • If DevTools is already open on the tab, the debugger cannot attach

Humanize

The humanize tool (ext_humanize) injects human-like micro-behaviors between agent actions. Instead of the agent clicking, typing, and navigating with machine precision and zero idle time, humanize adds the pauses, small scrolls, cursor drifts, and reading delays that real users naturally produce.
Use responsibly. Humanize is designed for testing, research, and personal automation. Some platforms actively detect automated behavior and may penalize, restrict, or ban accounts that violate their terms of service. Using humanize to circumvent bot detection on platforms that prohibit automation is done at your own risk. We do not encourage or endorse violating any platform’s terms of service.

Intensity Levels

LevelMicro-ActionsUse Case
lightRandom pauses, micro-scrollsMinimal footprint — adds basic timing variation
moderatePauses, micro-scrolls, cursor moves, hover on inert elements, variable scrollsBalanced — looks like a distracted human
heavyAll of the above + scroll bounces, idle drift, long pausesFull simulation — reading pauses, scroll-then-scroll-back, cursor wander

Micro-Actions

Each ext_humanize call picks one random action from the intensity pool and executes it:
  • Random pause — waits 0.8–2s (simulates thinking or reading)
  • Micro-scroll — scrolls a tiny amount up or down, sometimes scrolls back
  • Cursor move — moves the cursor to a random non-interactive element via Bezier path
  • Hover inert — moves to a non-interactive element and hovers briefly
  • Variable scroll — 2–4 small scroll steps in sequence with variable timing
  • Scroll bounce — scrolls down then back up (like overshooting while reading)
  • Idle drift — tiny random cursor movements around the current position
  • Long pause — waits 2–5s (simulates reading a paragraph)
All timing uses Gaussian distributions — no fixed delays. Mouse movements follow cubic Bezier curves with randomized control points.

With Debugger Mode

Humanize works with or without the debugger attached. When the debugger is active, scroll and cursor actions use CDP events. When it’s not, they fall back to content script execution. The agent doesn’t need to coordinate — humanize checks the debugger state automatically.

Screenshots

Screenshots are processed through sharp before being sent to the LLM:
  1. Extension captures the visible tab via chrome.tabs.captureVisibleTab()
  2. Plugin strips the data URL prefix, decodes to buffer
  3. Sharp resizes to the configured max width (default 1280px)
  4. Converts to the configured format (default JPEG, quality 80)
  5. Returns clean base64 inline — displayed automatically in chat
Configure resolution and format in Settings → Services → Browser Extension.

Side Panel

The extension includes a side panel that shows:
  • Connection status — connected, waiting, or disconnected
  • Live event feed — every tool call appears in real-time as the agent browses
  • Conversation history — browse events from past conversations
Events are logged per-conversation to ~/.wolffish/workspace/logs/extension/{conversationId}.jsonl.

Customization

Editing Agent Instructions

Edit ~/.wolffish/workspace/brain/cerebellum/.browser-extension/SKILL.md:
  • Change tool descriptions to guide the agent differently
  • Add or modify trigger keywords
  • Add safety patterns (danger_patterns, confirm_patterns)
  • Edit the body text for custom browsing procedures

Editing the Plugin

Edit ~/.wolffish/workspace/brain/cerebellum/.browser-extension/plugin/index.mjs:
  • Add pre/post processing to tool calls
  • Compose multiple commands into higher-level tools
  • Customize screenshot processing
  • Add new tools that combine existing commands

Building a Custom Extension

The WebSocket server is command-agnostic — it pipes any { id, type, params } to the extension and resolves when a matching response arrives. You can fork the extension, add new commands, and update the plugin to match. See the extension repository for the full source.

Supported Browsers

BrowserStatus
ChromiumSupported
ChromeSupported
BraveSupported
EdgeSupported
SafariNot supported
FirefoxNot supported
The extension uses Manifest V3 APIs. Safari and Firefox are not supported because the extension relies on Chrome-specific APIs (chrome.tabs.captureVisibleTab, chrome.debugger for CDP input and PDF, chrome.sidePanel, direct WebSocket in the service worker).

Safety

The extension inherits the standard capability safety system:
  • ext_execute_js with document.cookie or navigator.sendBeaconblocked
  • ext_execute_js (any) → requires approval
  • ext_downloadrequires approval
  • ext_cookies_setrequires approval
  • Navigation to financial sites → requires approval
These patterns are defined in the SKILL.md frontmatter and enforced by the amygdala module. You can customize them.
Debugger mode and humanize carry additional risk. These features make automated browsing less distinguishable from human browsing. While useful for testing and personal automation, using them to bypass bot detection or CAPTCHA systems on platforms that prohibit automation may result in account suspension or permanent bans. The responsibility lies with the user. Use these tools ethically and in compliance with each platform’s terms of service.