Use Case
Data Extraction with BrowseFleet
Extract structured data from websites, documents, and web applications using cloud browsers with AI-powered content parsing.
The Problem
Valuable data is locked in websites, web applications, and online documents that resist simple HTTP scraping. JavaScript-rendered content, authentication walls, paginated results, and complex layouts make extraction difficult. Building reliable extraction pipelines requires significant engineering effort.
The Solution
BrowseFleet combines cloud browser sessions with quick-action endpoints for flexible data extraction. Use sessions for complex multi-step extractions that require navigation and interaction, or use the scrape endpoint for simple pages. The Markdown output from quick actions is ideal for feeding into LLMs for structured data extraction.
import { BrowseFleet } from 'browsefleet';
import Anthropic from '@anthropic-ai/sdk';
const bf = new BrowseFleet({ apiKey: 'bf_...' });
const anthropic = new Anthropic();
// Simple extraction via quick action
const { markdown } = await bf.scrape(
'https://company.example.com/about'
);
// Use LLM to extract structured data from markdown
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
messages: [{
role: 'user',
content: `Extract structured data from this page:
${markdown}
Return JSON with: company_name, founded_year,
employee_count, headquarters, description`,
}],
});
const companyData = JSON.parse(
response.content[0].text
);
// Complex extraction with session for paginated data
const session = await bf.sessions.create({ stealth: 'full' });
const browser = await puppeteer.connect({
browserWSEndpoint: session.websocketUrl,
});
const page = await browser.newPage();
let allResults = [];
await page.goto('https://data.example.com/results');
while (true) {
const pageData = await page.evaluate(() => /* extract */);
allResults.push(...pageData);
const nextBtn = await page.$('.next-page:not([disabled])');
if (!nextBtn) break;
await nextBtn.click();
await page.waitForNavigation();
}
await session.close();Features Used
Benefits
- Markdown output is ideal for LLM-powered extraction
- Handle JavaScript-rendered content that HTTP clients miss
- Navigate paginated results with full session control
- Stealth mode bypasses anti-scraping protections
- Quick actions for simple pages, sessions for complex ones
- Scale extraction with concurrent browser sessions
Start building today
Free tier includes 500 daily requests. No credit card required.