API Reference

Scraping

The scrape endpoint loads a URL in a headless browser and extracts structured content. Returns raw HTML, cleaned HTML, markdown, readability text, links, and page metadata. No session management required.

POST/v1/scrape

Scrape a URL and extract structured content.

Parameters

Name	Type	Required	Default	Description
`url`	`string`	Yes	`—`	The URL to scrape.
`waitFor`	`string \| number`	No	`—`	CSS selector to wait for, or milliseconds to wait after page load.
`headers`	`Record<string, string>`	No	`—`	Custom HTTP headers to set on the page request.
`cookies`	`Cookie[]`	No	`—`	Cookies to inject before navigation. Each: { name, value, domain }.
`proxyUrl`	`string`	No	`—`	Proxy URL for this request (HTTP or SOCKS5).
`stealth`	`"none" \| "basic" \| "full"`	No	`"full"`	Anti-detection level.
`timeout`	`number`	No	`30000`	Navigation timeout in milliseconds.

Request Body

{
  "url": "https://example.com",
  "waitFor": 2000,
  "stealth": "full",
  "headers": {
    "Accept-Language": "en-US"
  }
}

Response

{
  "url": "https://example.com/",
  "statusCode": 200,
  "title": "Example Domain",
  "html": "<!doctype html>\n<html>\n<head>...",
  "cleanedHtml": "<h1>Example Domain</h1>\n<p>This domain is for use in...",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "readability": "Example Domain\n\nThis domain is for use in illustrative examples in documents.",
  "links": [
    { "href": "https://www.iana.org/domains/example", "text": "More information..." }
  ],
  "metadata": {
    "description": "Example domain for documentation",
    "ogImage": null,
    "canonical": "https://example.com/"
  }
}

curl -X POST https://api.browsefleet.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "x-api-key: bf_your_api_key" \
  -d '{
    "url": "https://example.com",
    "waitFor": 2000
  }'

Response Fields

Field	Type	Description
`url`	string	Final URL after redirects
`statusCode`	number	HTTP status code of the page
`title`	string	Page title from the document
`html`	string	Full raw HTML of the page
`cleanedHtml`	string	HTML with scripts, styles, and non-content elements removed
`markdown`	string	Page content converted to Markdown
`readability`	string	Plain text extracted via readability algorithm
`links`	Array	All links on the page, each with href and text
`metadata`	object	Page metadata: description, ogImage, canonical

Scrape vs Sessions

Use Case	Recommended
Extract content from a single URL	Scrape endpoint
Navigate through multiple pages	Session + Puppeteer
Fill forms and interact with UI	Session + Computer API
One-off screenshot or PDF	Quick action endpoints
Long-running automation	Session with extended timeout