Skip to main content
The Scrape category fetches web pages and returns content in a format your agent can actually read and reason over. Rather than handing your agent raw HTML, these tools strip boilerplate, navigation, and ads, returning clean markdown or structured text. With ~4 tools across three providers, the category covers everything from a quick lightweight fetch to unlocking pages protected by Cloudflare or anti-bot challenges.

Providers

ProviderOperationBest for
FirecrawlscrapeAny URL → clean markdown, with built-in caching, proxy rotation, and PDF support
Jina ReaderreadUrlGet / readUrlPostFast, lightweight URL → markdown conversion with minimal overhead
BrightDataunlockLast-resort fallback for Cloudflare and challenge-gated pages — returns raw HTML

Choosing a tool

Not every scraping job needs the same tool. Use this decision order:
  1. Start with Firecrawl for the vast majority of pages. It returns structured markdown, handles PDFs, supports caching to avoid redundant fetches, and respects onlyMainContent to trim nav and footer noise.
  2. Use Jina Reader when you need a fast, lightweight read and the page is publicly accessible without bot protection. The readUrlGet variant covers most cases; readUrlPost lets you pass a request body.
  3. Fall back to BrightData Web Unlocker only when a page is actively blocking headless browsers or presenting Cloudflare challenges. Note that it returns raw HTML rather than markdown, so your agent will need to parse or summarize it.

Example

The three-step flow for scraping with Firecrawl:
find_tools("scrape https://example.com and return the main content")
describe_tool("Firecrawl/scrape")
execute_tool("Firecrawl/scrape", { url: "https://example.com", formats: ["markdown"], onlyMainContent: true })
Firecrawl’s maxAge parameter lets you serve a cached version of a page instead of making a live fetch — useful when you’re scraping the same URL repeatedly and want to reduce cost and latency. Pair it with onlyMainContent: true to strip headers, footers, and navigation before the text reaches your model. Run describe_tool("Firecrawl/scrape") to see the full parameter list and per-call credit cost.