Your Real Browser, Not a Sandbox

The Wolffish browser extension gives the agent direct control of your real browser. Unlike the Playwright-based browser capability (which runs an isolated headless session), the extension operates in your actual browser — your cookies, logins, extensions, and open tabs are all available.

The agent automatically prefers ext_* tools over Playwright browser_* tools when the extension is connected. No configuration needed.

Why This Exists

Computer use — where an AI takes screenshots of your screen, moves your mouse, and types keystrokes — works, but it’s slow, expensive, and fragile. Every action requires a full-screen screenshot sent to a vision model, the model guesses where to click based on pixel coordinates, and failures cascade because there’s no DOM awareness. A single browsing session can burn through hundreds of screenshots at vision-model pricing. The extension replaces all of that with direct browser control. No screenshots needed for navigation. No pixel guessing. The agent sends ext_click('.submit-btn') and the extension clicks the actual DOM element. It sends ext_read_page({ format: 'text' }) and gets clean text back — not a 2MB screenshot for the model to squint at. The real unlock is session reuse. Your browser is already logged into everything — Gmail, LinkedIn, Reddit, GitHub, Notion, your company’s internal tools. The extension gives the agent access to all of that without storing credentials, managing OAuth flows, or launching throwaway browser instances. The agent operates as you, in your browser, with your context.

What This Actually Looks Like

“Summarize my Reddit front page” — the agent opens your Reddit feed (you’re already logged in), reads the visible posts, scrolls to load more, and gives you a summary. No Reddit API key, no OAuth, no rate limits.
“Search Reddit for reviews of the M4 MacBook Air” — navigates to Reddit search, reads the top results, clicks into threads, extracts the useful comments. All through your logged-in session where you see upvoted content, not the stripped-down anonymous view.
“Clean up my LinkedIn — delete conversations I haven’t replied to in 6 months” — opens LinkedIn messaging, reads conversation previews, identifies stale ones, deletes them. Requires your authenticated session, which the extension already has.
“Accept all pending LinkedIn connection requests from people in my industry” — opens the invitations page, reads each request, checks the person’s headline, accepts the relevant ones. No LinkedIn API (which doesn’t even expose this).
“Check my GitHub notifications and close any resolved issues” — opens your GitHub notifications (logged in), reads each one, follows the link, checks if the issue was resolved, closes it. Faster than the GitHub API for this kind of triage because the UI state is already loaded.
“Go to Hacker News, find today’s top AI papers, and save the links to a markdown file” — no API exists for this. The extension just reads the page like you would.
“Fill out this job application form with my details” — reads the form fields, fills them from your profile, handles dropdowns and multi-step flows. In your actual browser where you’re already logged into the job portal.

These aren’t hypothetical — they’re the workflows the extension was built for. Anything you do in Chrome by clicking and reading, the agent can do through the extension.

Cost: Why Model Choice Matters

Browser automation generates large contexts fast. Every ext_read_page returns the full visible text. Every ext_screenshot adds an image to the context. A typical browsing session with 20–30 tool calls can push the context to 100K+ tokens. On premium models, this gets expensive quickly:

Model	~30 tool calls	~50 tool calls
Claude Opus 4.6	~$25	~$40+
GPT-4o	~$8	~$15
DeepSeek V3	~$1.30	~$2.50
MiniMax-M1	~$1.50	~$3

A single browsing session that looks routine to you — “check 5 job listings and compare them to my profile” — can involve 30+ tool calls, 10+ page reads, and 2+ screenshots. On Opus this is a

25 turn. On DeepSeek it's

1.30 for the same result.

We recommend DeepSeek, MiniMax, or Kimi for browser-heavy workflows. They handle complex multi-step browsing reliably at a fraction of the cost. The quality difference for “navigate, read, extract, summarize” tasks is negligible — these models are excellent at structured data extraction and following multi-step instructions. Reserve premium models for tasks that genuinely need stronger reasoning, not for reading web pages.

You can set different models per conversation in Wolffish. Use DeepSeek for browsing sessions and Claude for complex reasoning tasks — you don’t have to choose one model for everything.

Architecture

The extension connects to the Wolffish app over a local WebSocket.

Agent → Plugin (ext_* tools) → WebSocket (localhost:23151) → Extension Service Worker → Chrome APIs / Content Script → Result

Component	Location	Role
WebSocket Server	`channels/extension/server.ts` (core)	Connection management, heartbeat, command routing
Event Logger	`channels/extension/log.ts` (core)	Per-conversation event logging
Plugin	`cerebellum/browser-extension/` (editable)	Tool definitions, execute logic, screenshot processing
Extension	`~/.wolffish/workspace/extension/`	Chrome extension loaded in the browser

Setup

Open Settings → Services → Browser Extension
Click Reveal in Finder to find the extension folder
Open Chrome or Brave → chrome://extensions
Enable Developer Mode → Load Unpacked → select the extension folder
The extension connects automatically — the status dot turns green

The extension auto-reconnects when the app restarts. You only need to load it once.

Auto-Updates

The extension updates itself automatically when Wolffish updates — no manual steps required.

Each Wolffish release ships with the latest extension files bundled inside the app binary
On every app launch, the bundled files are copied to ~/.wolffish/workspace/extension/, overwriting the previous version
When the extension connects, the server compares its self-reported version against the version on disk
If they differ, the server sends a reload command — the extension calls chrome.runtime.reload() on itself, picks up the new code, and reconnects automatically

The entire process is seamless. You’ll see a brief disconnect in the side panel (< 1 second) while the extension restarts, then it reconnects with the new version. You can also trigger a manual reload from Settings → Services → Browser Extension → Reload Extension.

49 Tools

The agent sees these as ext_* tools. The plugin translates them to browser_* commands over the wire. ext_navigate · ext_back · ext_forward · ext_reload

Page Interaction

ext_click · ext_type · ext_select · ext_hover · ext_scroll · ext_focus · ext_keypress · ext_drag_drop · ext_file_upload

Page Reading

ext_read_page · ext_query_selector · ext_get_attribute · ext_get_value · ext_get_url · ext_get_page_info

Tab & Window Management

ext_tabs_list · ext_tab_open · ext_tab_close · ext_tab_switch · ext_tab_duplicate · ext_tab_move · ext_windows_list · ext_window_open · ext_window_close · ext_window_resize

Capture

ext_screenshot · ext_pdf · ext_download

Data & Storage

ext_cookies_get · ext_cookies_set · ext_cookies_remove · ext_storage_get · ext_storage_set · ext_clipboard_read · ext_clipboard_write

Advanced

ext_execute_js · ext_wait_for · ext_wait_for_navigation · ext_wait_for_network_idle · ext_notify

Debugger Mode

ext_debugger_attach · ext_debugger_detach · ext_debugger_status

Mouse & Humanize

ext_mouse_move · ext_humanize

Debugger Mode

Debugger mode attaches Chrome DevTools Protocol (CDP) to a tab and replaces content-script-based interactions with low-level input events. When the debugger is attached, clicks, typing, scrolling, hovering, and keypresses are dispatched through CDP’s Input.dispatch* methods instead of DOM APIs.

Why It Exists

Content script interactions (element.click(), element.value = '...') are detectable. Sites can distinguish programmatic DOM events from real user input by checking event properties like isTrusted, monitoring event ordering, or using bot-detection libraries. CDP events bypass all of this — they enter the browser’s input pipeline at the same level as physical keyboard and mouse events.

How It Works

The agent calls ext_debugger_attach with a tab ID
Chrome attaches the debugger protocol to that tab (you’ll see a ”… is debugging this tab” banner)
All subsequent ext_click, ext_type, ext_scroll, ext_hover, and ext_keypress calls on that tab are automatically routed through CDP instead of the content script
Mouse movements follow Bezier curves with Gaussian-distributed timing — not straight lines with fixed delays
Typing dispatches individual keyDown/char/keyUp sequences per character with variable inter-key delays
When done, call ext_debugger_detach to release the debugger

Agent calls ext_click('.btn')
  ↓
Extension checks: debugger attached?
  ├─ Yes → resolves element coordinates → Bezier mouse path → CDP mousePressed/mouseReleased
  └─ No  → content script → element.click()

The routing is automatic. The agent doesn’t need to change how it calls tools — it just attaches the debugger first and everything switches.

Limitations

Only one tab can have the debugger attached at a time — attaching to a new tab detaches from the previous one
Chrome shows a yellow “debugging” banner at the top of the page (cannot be hidden)
chrome:// and chrome-extension:// pages cannot be debugged
If DevTools is already open on the tab, the debugger cannot attach

Humanize

The humanize tool (ext_humanize) injects human-like micro-behaviors between agent actions. Instead of the agent clicking, typing, and navigating with machine precision and zero idle time, humanize adds the pauses, small scrolls, cursor drifts, and reading delays that real users naturally produce.

Use responsibly. Humanize is designed for testing, research, and personal automation. Some platforms actively detect automated behavior and may penalize, restrict, or ban accounts that violate their terms of service. Using humanize to circumvent bot detection on platforms that prohibit automation is done at your own risk. We do not encourage or endorse violating any platform’s terms of service.

Intensity Levels

Level	Micro-Actions	Use Case
`light`	Random pauses, micro-scrolls	Minimal footprint — adds basic timing variation
`moderate`	Pauses, micro-scrolls, cursor moves, hover on inert elements, variable scrolls	Balanced — looks like a distracted human
`heavy`	All of the above + scroll bounces, idle drift, long pauses	Full simulation — reading pauses, scroll-then-scroll-back, cursor wander

Micro-Actions

Each ext_humanize call picks one random action from the intensity pool and executes it:

Random pause — waits 0.8–2s (simulates thinking or reading)
Micro-scroll — scrolls a tiny amount up or down, sometimes scrolls back
Cursor move — moves the cursor to a random non-interactive element via Bezier path
Hover inert — moves to a non-interactive element and hovers briefly
Variable scroll — 2–4 small scroll steps in sequence with variable timing
Scroll bounce — scrolls down then back up (like overshooting while reading)
Idle drift — tiny random cursor movements around the current position
Long pause — waits 2–5s (simulates reading a paragraph)

All timing uses Gaussian distributions — no fixed delays. Mouse movements follow cubic Bezier curves with randomized control points.

With Debugger Mode

Humanize works with or without the debugger attached. When the debugger is active, scroll and cursor actions use CDP events. When it’s not, they fall back to content script execution. The agent doesn’t need to coordinate — humanize checks the debugger state automatically.

Screenshots

Screenshots are processed through sharp before being sent to the LLM:

Extension captures the visible tab via chrome.tabs.captureVisibleTab()
Plugin strips the data URL prefix, decodes to buffer
Sharp resizes to the configured max width (default 1280px)
Converts to the configured format (default JPEG, quality 80)
Returns clean base64 inline — displayed automatically in chat

Configure resolution and format in Settings → Services → Browser Extension.

Side Panel

The extension includes a side panel that shows:

Connection status — connected, waiting, or disconnected
Live event feed — every tool call appears in real-time as the agent browses
Conversation history — browse events from past conversations

Events are logged per-conversation to ~/.wolffish/workspace/logs/extension/{conversationId}.jsonl.

Customization

Editing Agent Instructions

Edit ~/.wolffish/workspace/brain/cerebellum/.browser-extension/SKILL.md:

Change tool descriptions to guide the agent differently
Add or modify trigger keywords
Add safety patterns (danger_patterns, confirm_patterns)
Edit the body text for custom browsing procedures

Editing the Plugin

Edit ~/.wolffish/workspace/brain/cerebellum/.browser-extension/plugin/index.mjs:

Add pre/post processing to tool calls
Compose multiple commands into higher-level tools
Customize screenshot processing
Add new tools that combine existing commands

Building a Custom Extension

The WebSocket server is command-agnostic — it pipes any { id, type, params } to the extension and resolves when a matching response arrives. You can fork the extension, add new commands, and update the plugin to match. See the extension repository for the full source.

Supported Browsers

Browser	Status
Chromium	Supported
Chrome	Supported
Brave	Supported
Edge	Supported
Safari	Not supported
Firefox	Not supported

The extension uses Manifest V3 APIs. Safari and Firefox are not supported because the extension relies on Chrome-specific APIs (chrome.tabs.captureVisibleTab, chrome.debugger for CDP input and PDF, chrome.sidePanel, direct WebSocket in the service worker).

Safety

The extension inherits the standard capability safety system:

ext_execute_js with document.cookie or navigator.sendBeacon → blocked
ext_execute_js (any) → requires approval
ext_download → requires approval
ext_cookies_set → requires approval
Navigation to financial sites → requires approval

These patterns are defined in the SKILL.md frontmatter and enforced by the amygdala module. You can customize them.

Debugger mode and humanize carry additional risk. These features make automated browsing less distinguishable from human browsing. While useful for testing and personal automation, using them to bypass bot detection or CAPTCHA systems on platforms that prohibit automation may result in account suspension or permanent bans. The responsibility lies with the user. Use these tools ethically and in compliance with each platform’s terms of service.

​Your Real Browser, Not a Sandbox

​Why This Exists

​What This Actually Looks Like

​Cost: Why Model Choice Matters

​Architecture

​Setup

​Auto-Updates

​49 Tools

​Navigation

​Page Interaction

​Page Reading

​Tab & Window Management

​Capture

​Data & Storage

​Advanced

​Debugger Mode

​Mouse & Humanize

​Debugger Mode

​Why It Exists

​How It Works

​Limitations

​Humanize

​Intensity Levels

​Micro-Actions

​With Debugger Mode

​Screenshots

​Side Panel

​Customization

​Editing Agent Instructions

​Editing the Plugin

​Building a Custom Extension

​Supported Browsers

​Safety

Your Real Browser, Not a Sandbox

Why This Exists

What This Actually Looks Like

Cost: Why Model Choice Matters

Architecture

Setup

Auto-Updates

49 Tools

Navigation

Page Interaction

Page Reading

Tab & Window Management

Capture

Data & Storage

Advanced

Debugger Mode

Mouse & Humanize

Debugger Mode

Why It Exists

How It Works

Limitations

Humanize

Intensity Levels

Micro-Actions

With Debugger Mode

Screenshots

Side Panel

Customization

Editing Agent Instructions

Editing the Plugin

Building a Custom Extension

Supported Browsers

Safety