Claude Computer Use is Anthropic's API for letting Claude interact with computer interfaces through screenshots. Instead of parsing HTML or using CSS selectors, Claude looks at a screenshot of a web page and decides what to click, type, or scroll, the same way a human would.
BrowseFleet's Computer API was built specifically for this workflow. Every action you execute returns a screenshot, creating the continuous feedback loop that Claude needs to navigate and interact with websites.
What Is Claude Computer Use
Computer Use extends Claude's capabilities from text and images to interactive computer interfaces. You send Claude a screenshot of a browser, tell it what you want to accomplish, and it responds with specific actions: click at coordinates (x, y), type "search query", scroll down, press Enter.
The key insight is that Claude does not need to understand HTML structure or CSS selectors. It looks at the page visually and understands the interface the same way a human does. This means it can interact with any website, regardless of how the HTML is structured or whether it uses custom components that resist traditional selectors.
The API protocol is straightforward: 1. You provide a screenshot and a task description 2. Claude analyzes the screenshot and returns an action 3. You execute the action and take a new screenshot 4. Repeat until the task is complete
When to Use Vision-Based Scraping
Traditional scraping (CSS selectors, XPath, DOM traversal) is faster and cheaper. Use it when you can. Vision-based scraping with Claude Computer Use is the right choice when:
Selectors are fragile. Some sites use obfuscated class names (e.g., .css-1a2b3c4), dynamically generated IDs, or deeply nested structures that make selector-based extraction unreliable.
Content is behind interactions. Some data only appears after clicking buttons, expanding sections, hovering over elements, or completing multi-step flows. Vision-based agents handle these interactions naturally.
The site structure changes frequently. If a site redesigns its HTML every few weeks (common with SPA frameworks), selector-based scrapers break constantly. Vision-based scrapers are resilient to structural changes because they interact with the visual interface, not the DOM.
You need to scrape many different sites. Writing and maintaining selectors for dozens of different websites is expensive. A vision-based agent can navigate unfamiliar sites without site-specific code.
Setting Up BrowseFleet's Computer API
BrowseFleet's Computer API provides four actions that map directly to what Claude needs:
- navigate(sessionId, url) - Go to a URL, return screenshot
- click(sessionId, x, y) - Click at coordinates, return screenshot
- type(sessionId, text) - Type text, return screenshot
- scroll(sessionId, direction, amount) - Scroll, return screenshot
Every action returns a base64-encoded PNG screenshot that you can pass directly to Claude.
import { BrowseFleet } from 'browsefleet';
import Anthropic from '@anthropic-ai/sdk';
const bf = new BrowseFleet({ apiKey: 'bf_...' });
const anthropic = new Anthropic();
// Create a session for Computer Use
const session = await bf.sessions.create({
stealth: 'full',
viewport: { width: 1280, height: 800 },
});
// Navigate and get the first screenshot
const screenshot = await bf.computer.navigate(
session.id,
'https://target-site.com'
);
// screenshot is a base64 PNG string ready for ClaudeThe viewport size matters. Claude's spatial reasoning works best with standard desktop resolutions. We recommend 1280x800 for most tasks. Larger viewports give Claude more context but increase API costs (larger images) and processing time.
Building the Scraper Step by Step
Let us build a complete vision-based scraper. The task: extract product information from an e-commerce site with complex JavaScript rendering.
import { BrowseFleet } from 'browsefleet';
import Anthropic from '@anthropic-ai/sdk';
const bf = new BrowseFleet({ apiKey: 'bf_...' });
const anthropic = new Anthropic();
interface Product {
name: string;
price: string;
rating: string;
description: string;
}
async function scrapeProducts(url: string): Promise<Product[]> {
const session = await bf.sessions.create({
stealth: 'full',
viewport: { width: 1280, height: 800 },
});
let screenshot = await bf.computer.navigate(session.id, url);
const products: Product[] = [];
// Ask Claude to extract product data from the visible page
const extractResponse = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
messages: [{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: screenshot,
},
},
{
type: 'text',
text: 'Extract all product information visible on this page. For each product, provide: name, price, rating, and description. Return as JSON array.',
},
],
}],
});
const pageProducts = JSON.parse(extractResponse.content[0].text);
products.push(...pageProducts);
// Check if there is a "next page" button
const navResponse = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 256,
messages: [{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: screenshot,
},
},
{
type: 'text',
text: 'Is there a "next page" or pagination button visible? If yes, return the click coordinates as JSON: {"x": number, "y": number}. If no, return {"done": true}.',
},
],
}],
});
const navAction = JSON.parse(navResponse.content[0].text);
if (!navAction.done) {
screenshot = await bf.computer.click(
session.id,
navAction.x,
navAction.y
);
// Continue extraction on next page...
}
await session.close();
return products;
}Handling Dynamic Content
Many sites load content dynamically: lazy-loaded images, infinite scroll, expandable sections, and AJAX-loaded data. Vision-based scraping handles these naturally:
Infinite scroll. Tell Claude to scroll down and check for new content. After each scroll, extract any new data that appeared.
let previousDataCount = 0;
let currentDataCount = 0;
let scrollAttempts = 0;
do {
previousDataCount = currentDataCount;
// Scroll down
screenshot = await bf.computer.scroll(session.id, 'down', 500);
await sleep(2000); // Wait for content to load
// Extract data from current view
const newData = await extractWithClaude(screenshot);
allData.push(...newData);
currentDataCount = allData.length;
scrollAttempts++;
} while (currentDataCount > previousDataCount && scrollAttempts < 20);Expandable sections. Ask Claude to identify and click "show more" or "expand" elements, then extract the revealed content.
Tabs and filters. Tell Claude to click on each tab or filter option, extract data, then move to the next one. The visual approach makes this trivial. Claude can see the tab bar and knows which tab is currently active.
A Real-World Example: Extracting Pricing Data
Let us build a practical scraper that extracts pricing data from a SaaS website with a complex pricing page.
async function scrapePricing(url: string) {
const session = await bf.sessions.create({
stealth: 'full',
viewport: { width: 1920, height: 1080 },
});
const screenshot = await bf.computer.navigate(session.id, url);
// Claude can read pricing tables, feature comparisons,
// and even pricing calculators from screenshots
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
messages: [{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: screenshot,
},
},
{
type: 'text',
text: `Analyze this pricing page and extract all pricing information.
Return structured JSON with:
- plans: array of { name, price, billing_period, features: string[] }
- enterprise: whether there's a custom/enterprise plan
- free_trial: whether a free trial is offered
- notes: any important pricing notes or limitations`,
},
],
}],
});
await session.close();
return JSON.parse(response.content[0].text);
}This works even on pricing pages with complex layouts, interactive sliders, toggle switches (monthly/annual), and feature comparison matrices. Claude reads the visual layout and extracts the information, regardless of the underlying HTML structure.
Cost and Performance Considerations
Vision-based scraping is more expensive than traditional scraping:
API costs. Each Claude API call with a screenshot costs roughly $0.01-0.03 depending on image size and response length. For a scraping job that requires 10 screenshots per page, that is $0.10-0.30 per page, significantly more than selector-based scraping.
Latency. Each screenshot-to-action cycle takes 2-5 seconds (1-2s for the API call, 1-2s for the action and page load). A multi-step scraping workflow might take 30-60 seconds per page.
When the economics work. Vision-based scraping makes economic sense when the alternative is spending hours writing and maintaining fragile selectors, or when the data is valuable enough to justify the per-page cost.
Optimization strategies:
- Use vision-based scraping only for sites that resist traditional methods
- Combine approaches: use bf.scrape for simple content extraction and Claude for complex interactions
- Reduce screenshot resolution for tasks that do not require fine detail
- Batch extraction: ask Claude to extract all data from one screenshot rather than making multiple calls