How to Extract and Analyze PDFs with OpenClaw Agents
OpenClaw's built-in PDF tool lets agents extract text and analyze documents through native provider APIs or an automatic text-plus-image fallback. It supports up to 10 PDFs per call, page filtering for targeted extraction, and a 4 million pixel budget for image rendering. This guide covers both execution modes, batch processing workflows, structured extraction patterns, and how to persist extracted data in shared workspaces.
How OpenClaw's PDF Tool Works
OpenClaw includes a PDF tool that registers automatically when a vision-capable model is available. There is no separate installation step. The tool resolves a model through a priority chain: it checks for an explicit pdfModel configuration first, falls back to an image model, then tries the session default or any auth-backed provider with native PDF support.
The tool operates in two distinct modes depending on which provider backs the resolved model.
Native Provider Mode works with Anthropic and Google models. It sends raw PDF bytes directly to the provider API as document blocks. This is the fast path because there is no intermediate text extraction step. The provider handles parsing, layout understanding, and content extraction internally. One limitation: page filtering is not supported in native mode. If you pass a pages parameter, the tool returns an error.
Extraction Fallback Mode activates for all other providers. It extracts text from each page (up to 20 pages by default) and checks whether the extraction produced meaningful content. If a page yields fewer than 200 characters of text, the tool renders that page as a PNG image and sends it to the vision model instead. This fallback uses a pixel budget of 4 million pixels across all rendered pages, which prevents memory issues on long documents with many image-heavy pages.
Both modes accept the same inputs: a file path or URL for the PDF, an optional prompt describing what to extract, and an optional model override. The default prompt is a generic "Analyze this PDF document," but you get much better results by being specific. Asking "Extract all invoice line items with amounts and dates" produces more structured output than leaving the prompt at its default.
Each PDF can be up to 10MB by default, configurable through the pdfMaxBytesMb setting.
How to Process Single and Batch PDFs
The simplest use case is analyzing a single document. Pass a local file path, a file:// URI, or an HTTP URL to the pdf parameter along with a prompt that describes what you want extracted. The tool handles format detection, text extraction, and model routing automatically.
For batch processing, use the pdfs parameter with an array of up to 10 document references. The tool deduplicates inputs, merges results, and returns a combined response. This is useful for comparative analysis across documents, like reviewing multiple vendor proposals against the same evaluation criteria or extracting key dates from a stack of contracts.
Page filtering lets you target specific sections of a document without processing the entire file. The syntax supports individual pages and ranges: "1-5" extracts the first five pages, while "1,3,7-9" picks specific pages. Pages are 1-based. This works only in extraction fallback mode. Native providers process the entire document.
Configuring Defaults
Three settings control the PDF tool's behavior:
- agents.defaults.pdfModel: Set a primary model and optional fallbacks for PDF processing. Format is provider/model, like "anthropic/claude-opus-4-6"
- agents.defaults.pdfMaxBytesMb: Maximum file size per PDF, defaults to 10MB
- agents.defaults.pdfMaxPages: Maximum pages to process in fallback mode, defaults to 20
When processing fails, the tool returns specific error codes. "too_many_pdfs" fires when you exceed the 10-document limit. "unsupported_pdf_reference" indicates a path format the tool cannot resolve. In sandbox mode, remote HTTP and HTTPS URLs are blocked entirely, so you will need to download files locally first.
Structured Extraction Workflows
Raw text extraction is the starting point, not the destination. The real value comes from turning extracted content into structured data that other systems can consume.
OpenClaw's skill architecture supports multi-step extraction workflows. A typical pattern has three phases: parse the PDF into an intermediate format, extract specific fields into a schema, then validate the results before passing them downstream.
Parse First, Extract Second
For documents with complex layouts (tables, multi-column text, nested headers), convert the PDF to Markdown or JSON before running extraction prompts. Libraries like PyMuPDF and pdfplumber preserve structural elements that plain text extraction flattens. This intermediate step means your extraction prompts can reference headings, table cells, and list items by their structural position rather than hoping the raw text maintains enough formatting.
Define Your Schema Upfront
Before extracting, decide exactly what fields you need. For invoices, that might be vendor name, invoice number, line items with quantities and unit prices, tax amount, and total. For contracts, it could be effective date, counterparty names, renewal terms, and governing law. Defining the schema before running extraction prevents the common failure of getting inconsistent output shapes across documents in a batch.
Validate Everything
Treat extracted data the way you would treat user input: assume it is wrong until proven otherwise. Check that numeric totals add up, dates fall within plausible ranges, required fields are present, and currency values use consistent formatting. The LumaDock tutorial on OpenClaw PDF workflows recommends validating totals, required fields, date sanity, and currency consistency as standard practice.
For repeating workflows (monthly invoice processing, weekly report extraction), wrap these three phases into an OpenClaw skill. Skills are folders containing a SKILL.md file plus helper scripts, so the entire parse-extract-validate pipeline can run as a single command.
Persist extracted PDF data across agent sessions
50GB free workspace with Intelligence Mode indexing, Metadata Views for structured extraction, and MCP access for your OpenClaw agents. No credit card required.
Persisting Extracted Data in Shared Workspaces
PDF extraction produces valuable structured output, but that output needs to live somewhere accessible to both agents and humans. Local files work for single-agent setups. Shared workspaces work better when multiple agents process documents or when humans need to review and approve extracted data.
Local storage is the simplest option. Write extracted JSON or CSV to a project directory and version it with Git. This works well for personal automation but breaks down when a second agent needs the same data or when you want a non-technical colleague to review the results.
Cloud object storage (S3, Google Cloud Storage) handles multi-agent access but requires infrastructure setup, IAM configuration, and custom tooling for browsing and searching stored extractions.
Fast.io workspaces provide a middle path. Create a workspace, upload extracted data, and any agent or human with access can browse, search, and query the files. With Intelligence Mode enabled, uploaded extractions are automatically indexed for semantic search and RAG-style question answering. Ask "What was the total across all Q1 invoices?" and get an answer with citations pointing to specific files.
For structured extraction at scale, Fast.io's Metadata Views go further. Describe the fields you want extracted in natural language, and the system designs a typed schema, matches files in the workspace, and populates a sortable, filterable spreadsheet. This turns a workspace full of PDFs into a queryable database without writing extraction code.
The free agent plan includes 50GB storage, 5,000 credits per month, and 5 workspaces with no credit card required. That is enough capacity for most document processing workflows. When the extraction pipeline is complete, use ownership transfer to hand the workspace to a client or team lead while keeping admin access for maintenance.
Multi-Document Analysis Patterns
Batch processing 10 PDFs at once opens up analysis patterns that are impossible with single-document extraction.
Cross-Document Comparison
Pass multiple vendor proposals, competing contracts, or sequential versions of the same document and ask the agent to compare them. The prompt "Compare pricing terms across these three proposals and identify where they differ" produces a structured comparison that would take a human analyst an hour to compile manually.
Aggregation Across a Document Set
Extract the same fields from every document in a batch, then aggregate. Process 10 monthly financial statements and ask for a trend analysis of revenue, expenses, and margins. The agent sees all documents in a single context window, so it can identify patterns across the full set rather than summarizing each document in isolation.
Sequential Processing for Large Collections
When you have more than 10 documents, break the work into batches. Process the first 10, store the results, process the next 10, then run a synthesis pass over all stored results. This is where persistent storage becomes important. Each batch writes its output to a shared workspace, and the synthesis agent reads from that same workspace to produce the final analysis.
Security Considerations
PDFs should be treated as untrusted input. They can contain hidden text layers, embedded scripts, and content designed to influence model behavior. The LumaDock OpenClaw tutorial recommends restricting file access to dedicated folders, never overwriting original files, and sandboxing tools when possible. Keep original PDFs separate from extracted output, and validate that extracted content matches what a human would see when viewing the document.
Steps for Building a Complete PDF Pipeline
A production-ready PDF extraction pipeline connects OpenClaw's PDF tool with storage, validation, and human review. Here is a practical workflow for processing incoming documents on an ongoing basis.
Step 1: Receive documents. Set up a Fast.io Receive share where clients or team members upload PDFs. Incoming files land in a designated workspace folder. Webhooks notify your agent when new files arrive, so there is no polling.
Step 2: Extract and structure. The agent picks up new PDFs, runs extraction with a defined schema, and writes structured output (JSON, CSV, or YAML) back to the workspace. For documents under 10MB with fewer than 20 pages, the built-in PDF tool handles everything. For larger documents, use a parsing library as a preprocessing step before sending chunks to the model.
Step 3: Validate results. Run automated checks on extracted data: required fields present, numeric totals consistent, dates in expected ranges. Flag documents that fail validation for human review rather than silently passing bad data downstream.
Step 4: Store and index. Upload validated extractions to an Intelligence-enabled workspace. The files are automatically indexed for semantic search. Team members can ask questions about the extracted data without opening individual files.
Step 5: Hand off. When the extraction project is complete, transfer workspace ownership to the stakeholder who needs the data. They get a clean workspace with all source documents, extracted data, and full search capability.
For teams running this workflow regularly, Fast.io's file locks prevent conflicts when multiple agents process documents from the same workspace concurrently. Each agent acquires a lock before processing a file and releases it when done, so two agents never extract from the same document simultaneously.
Frequently Asked Questions
How do I extract text from PDFs with OpenClaw?
Use OpenClaw's built-in PDF tool by passing a file path or URL to the pdf parameter along with a prompt describing what to extract. The tool automatically selects native mode (for Anthropic and Google models) or extraction fallback mode (for other providers). Native mode sends raw PDF bytes to the provider API. Fallback mode extracts text page by page and renders image-heavy pages as PNGs when text content is minimal. No separate installation is required, as the tool registers automatically when a vision-capable model is configured.
Can OpenClaw analyze multiple PDFs at once?
Yes. Use the pdfs parameter with an array of up to 10 document references. The tool deduplicates inputs, processes all documents, and returns a merged response. This supports cross-document comparison, batch field extraction, and aggregation analysis across document sets. For collections larger than 10 documents, process in sequential batches and store intermediate results in a shared workspace.
What PDF formats does OpenClaw support?
OpenClaw's PDF tool accepts standard PDF files up to 10MB by default (configurable via pdfMaxBytesMb). It handles both digital PDFs with selectable text and scanned/image-based PDFs through the extraction fallback's image rendering pipeline. The fallback mode uses a 4 million pixel budget to render pages as PNG images when text extraction yields fewer than 200 characters per page. Inputs can be local file paths, file:// URIs, or HTTP/HTTPS URLs (remote URLs are blocked in sandbox mode).
How do I target specific pages in a PDF?
Use the pages parameter with range syntax like '1-5' for consecutive pages or '1,3,7-9' for specific pages. Pages are 1-based. This filtering only works in extraction fallback mode (non-Anthropic, non-Google providers). Native provider mode processes the entire document and returns an error if a pages parameter is set. The maximum pages processed in fallback mode defaults to 20, configurable through agents.defaults.pdfMaxPages.
Where should I store extracted PDF data for team access?
For single-agent personal use, local files or Git-versioned directories work fine. For multi-agent or team workflows, shared cloud storage is better. Fast.io workspaces let agents and humans access the same extracted data, with Intelligence Mode providing automatic indexing for semantic search. Metadata Views can turn uploaded PDFs into a sortable, filterable spreadsheet without writing extraction code. The free plan includes 50GB storage and 5 workspaces.
Related Resources
Persist extracted PDF data across agent sessions
50GB free workspace with Intelligence Mode indexing, Metadata Views for structured extraction, and MCP access for your OpenClaw agents. No credit card required.