How does Claude computer use work?

Claude computer use operates through a screenshot-action loop. Claude captures a screenshot of the computer display, analyzes the image to understand what's on screen, decides on an action (clicking, typing, scrolling, or pressing a keyboard shortcut), and executes it. Your application handles the actual interaction with the computing environment and reports results back to Claude. This cycle repeats until the task is complete or the iteration limit is reached. Unlike traditional automation tools that rely on CSS selectors or DOM access, Claude reads the screen visually and works with any application that has a graphical interface.

Which Claude models support computer use?

Seven current models support computer use. Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, and Opus 4.5 use the computer-use-2025-11-24 beta header and get the latest capabilities including zoom. Sonnet 4.5 and Haiku 4.5 use the computer-use-2025-01-24 header with core actions but without zoom. Opus 4.8 offers the strongest overall performance for complex tasks, while Sonnet 4.6 provides the best accuracy-to-cost ratio for well-defined workflows.

Is Claude computer use free?

Computer use through the API is billed at standard per-token rates for the model you choose, plus additional tokens for screenshots sent as images in the conversation. There is no separate surcharge for the computer use feature itself. In the Claude Desktop app, computer use is included with Pro and Max subscriptions. The API requires an Anthropic API key with billing configured.

Can Claude control my computer autonomously?

Yes, with appropriate safeguards. Through the API, Claude operates inside a sandboxed environment (typically a Docker container) where it controls the virtual desktop. In Claude Desktop, computer use runs on your actual machine but requests permission before accessing new applications. Anthropic recommends keeping a human in the loop for tasks with real-world consequences. Automatic prompt injection classifiers provide an additional safety layer, pausing Claude when suspicious on-screen content is detected.

What is the zoom feature in Claude computer use?

The zoom feature lets Claude inspect a specific region of the screen at full resolution. When small text, UI labels, or dense interface elements are hard to read from a full-screen screenshot, Claude can request a cropped view of a rectangular area defined by pixel coordinates. This is available with the computer_20251124 tool type on models using the computer-use-2025-11-24 beta header. Enable it by adding enable_zoom to your tool definition.

How should I handle files created during computer use sessions?

Computer use often generates output files like downloaded documents, screenshots, and exported reports. In the API approach, these files live inside your Docker container and persist only until the container is removed. Extract them through mounted volumes or upload scripts. In Claude Desktop, files save to your local filesystem. For team workflows where outputs need review, upload results to a shared workspace. MCP-connected platforms like Fast.io let agents push files directly to workspaces where team members can access, comment on, and approve the results.

Claude Computer Use: Complete Setup and Developer Guide

How the Screenshot-Action Loop Works

Opus 4.8 scores 84% on Online-Mind2Web for autonomous web navigation, a meaningful jump over both Opus 4.7 and GPT-5.5. That benchmark tests whether a model can complete real multi-step browser tasks end to end. Claude now leads among single-agent systems, which means computer use has crossed from experimental demo into a practical automation layer.

Computer use works through a repeating cycle. Claude captures a screenshot of your display, analyzes the image to understand what's on screen, decides on an action, and executes it. Actions include clicking at specific pixel coordinates, typing text into fields, pressing keyboard shortcuts, scrolling in any direction, and dragging elements between positions. After each action, Claude takes another screenshot to verify the result before planning the next move.

Your application sits between Claude and the computing environment. When Claude requests "left_click at [450, 320]," your code translates that into an actual mouse event on a virtual display, captures the resulting screenshot, and sends it back as a tool result. Claude never connects to your machine directly. It sees only the screenshots and action results you provide.

Consider a concrete example. You ask Claude to find a specific cell in a spreadsheet and update its value. Claude screenshots the desktop, sees the spreadsheet application, clicks on the target cell (navigating through tabs or scrolling if needed), types the new value, and presses Enter. Each step generates a screenshot that Claude uses to verify progress. If it clicks the wrong cell, it sees the error in the next screenshot and corrects course. This self-correcting behavior is what makes the approach work for real tasks.

This cycle, which Anthropic calls the "agent loop," continues until Claude completes the task or hits your configured iteration limit. Most tasks finish in 5 to 15 iterations. Complex multi-step workflows might need 30 or more. The cap prevents runaway API costs from tasks that get stuck.

The key difference from tools like Selenium or Puppeteer: Claude needs no selectors, no DOM access, and no knowledge of the UI framework running on screen. It reads the display like a person would. If a button is visible, Claude can click it. This makes computer use work across any application, from web browsers and spreadsheets to design tools and terminal emulators, without writing integration code for each one.

The tradeoff is speed. Screen-based interaction is inherently slower than direct API calls. Anthropic's documentation acknowledges this: when a native integration or MCP tool exists for a task, use that instead. Computer use is the fallback for applications that don't expose programmable interfaces at all.

Claude analyzing a desktop screenshot to determine the next action

Supported Models and Beta Headers

Computer use requires a specific beta header in your API requests. Anthropic currently maintains two header versions, each tied to different model families.

The computer-use-2025-11-24 header works with Claude Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, and Opus 4.5. These models run the computer_20251124 tool type, which includes every available action plus the zoom feature for inspecting specific screen regions at full resolution.

The older computer-use-2025-01-24 header supports Claude Sonnet 4.5 and Haiku 4.5. These models use the computer_20250124 tool type with core actions (screenshot, click, type, key, scroll, drag) but no zoom.

Picking the right model depends on your task complexity and budget.

Opus 4.8 is the strongest option for tasks that require reasoning through unfamiliar interfaces, recovering from errors, and navigating multi-step workflows. Its 84% Online-Mind2Web score reflects this. Use it for open-ended automation where Claude needs to figure out the path itself.

Sonnet 4.6 delivers strong performance at lower cost. It scored 94% on the Pace insurance benchmark for submission intake and claims processing, the highest of any model tested on that evaluation. It also matches Opus 4.6 on prompt injection resistance. Choose Sonnet 4.6 for well-defined tasks on consistent interfaces where you know what the screen will look like.

Haiku 4.5 suits simple, repeatable operations where cost efficiency matters most. Form filling on known templates, screenshot-based monitoring, and data extraction from predictable layouts are good fits.

All models share core capabilities: screenshot capture, mouse control (click, move, drag), keyboard input (text and shortcuts), and scroll. The newer computer_20251124 tool type adds right-click, middle-click, double-click, triple-click, hold_key for timed key presses, and a wait action for pausing between steps. It also introduces the zoom action, which lets Claude request a cropped, full-resolution view of any rectangular region on screen, useful when small text or dense UI elements are hard to read from a full screenshot.

How to Set Up Computer Use via the API

Three pieces go into an API-based computer use setup: the tool definition, the beta header, and the agent loop.

The tool definition tells Claude the screen resolution and which capabilities are available. Here is a minimal Python example:

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    tools=[
        {
            "type": "computer_20251124",
            "name": "computer",
            "display_width_px": 1024,
            "display_height_px": 768,
            "display_number": 1,
        },
    ],
    messages=[
        {"role": "user", "content": "Open the browser and search for Claude API docs"}
    ],
    betas=["computer-use-2025-11-24"],
)

Set display_width_px and display_height_px to match your virtual display's resolution. The display_number identifies which monitor to use when multiple are available. To enable the zoom feature, add "enable_zoom": True to the tool definition.

When Claude responds, it returns a tool_use block describing the action it wants to take. Your code extracts this action, executes it in the computing environment, captures a screenshot, and sends the result back as a tool_result. This back-and-forth continues until Claude finishes the task.

The agent loop automates this cycle:

def run_computer_use(messages, max_iterations=10):
    for _ in range(max_iterations):
        response = client.beta.messages.create(
            model="claude-opus-4-8",
            max_tokens=4096,
            messages=messages,
            tools=TOOLS,
            betas=["computer-use-2025-11-24"],
        )
        messages.append({"role": "assistant", "content": response.content})
        tool_results = process_tool_calls(response)
        if not tool_results:
            return messages
        messages.append({"role": "user", "content": tool_results})
    return messages

The process_tool_calls function is where you map Claude's abstract requests (like "screenshot" or "left_click at [450, 320]") into real operations on your virtual display. Claude sends tool requests as JSON objects with fields like action, coordinate, and text. A screenshot action returns a base64-encoded image. A click action takes [x, y] coordinates. A type action takes a string. Your handler converts these into actual X11 events and captures the display state after each one.

The computing environment needs a virtual display server (Xvfb on Linux), a window manager, and whatever applications your task requires. Anthropic provides a complete reference implementation at github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo. It includes a Dockerized Linux desktop with Firefox and LibreOffice, working tool handler implementations, and a web interface for watching Claude work in real time.

For production deployments, build your own container image with only the applications your workflow needs. The reference implementation is designed for evaluation, not production. Lock down network access to specific domains, set up logging for every action Claude takes, and cap the iteration limit based on your expected task length.

Persist your computer use outputs in one shared workspace

Free 50GB workspace with MCP server access. Upload screenshots, documents, and session files from computer use workflows. No credit card, no expiration.

Computer Use in Claude Desktop and Cowork

Not everyone needs to build against the API. Claude Desktop offers computer use as a settings toggle for Pro and Max subscribers, available on macOS and Windows as a research preview.

Turn it on, and Claude can open files, browse the web, and run desktop applications on your machine. No Docker containers, no agent loop code. Claude asks permission before accessing each new application, and you can interrupt it at any point.

The desktop implementation is smarter about tool selection than the raw API path. Claude checks for native integrations first. If it has a direct tool for something (like its built-in file editor for text files or development tools for coding), it uses that instead. Computer use activates only when no faster integration exists. This means desktop computer use is generally more efficient than forcing every task through the screenshot loop.

Dispatch adds asynchronous task execution. You assign Claude a task on your phone, put the phone down, and find the finished work on your computer later. Anthropic describes real scenarios in their blog post: morning briefings compiled from email and calendar, code changes committed and submitted as pull requests, and scheduling tasks that span multiple applications. Your computer stays awake, and Claude works through the task at its own pace.

For web-heavy workflows, Claude Cowork's browser integration lets Claude interact with pages directly through Chrome rather than screenshotting a standalone window. This tighter coupling improves accuracy for form completion, data extraction, and multi-tab workflows compared to the generic screenshot approach.

One practical challenge with computer use: it generates outputs that need to go somewhere. Screenshots accumulate. Downloaded files land in temporary directories. Processed documents need to reach reviewers. For individual use, local folders work fine. For teams or agent-to-human handoffs, you need shared storage.

Services like Google Drive and Dropbox handle basic file sync. For agent workflows that need MCP connectivity, Fast.io workspaces let Claude push files directly to shared locations through the MCP server. Team members review, comment, and approve from the same workspace. The free plan covers 50GB of storage and 5,000 monthly credits with no credit card or expiration.

The combination of computer use and workspace storage creates a clean handoff pattern. Claude automates the screen-level work (filling forms, processing documents, navigating legacy systems) then uploads results to a shared workspace. A human opens the workspace, reviews the output, and either approves it or sends feedback. This keeps automation output organized, versioned, and searchable instead of scattered across local download folders.

Safety Measures for Production

Computer use introduces risks that standard API features don't carry. Claude interacts with live applications and can execute actions with real consequences. Anthropic is explicit about this in the documentation: treat the feature as beta and build safety margins into every deployment.

Isolation is the first line of defense. Run computer use inside a dedicated virtual machine or Docker container with minimal privileges. The reference implementation uses a containerized Linux desktop for exactly this reason. Don't give Claude access to your primary development environment, credentials, or production systems. If Claude only needs to fill out web forms, the container doesn't need SSH access, email clients, or a terminal.

Restrict network access to an allowlist of domains your workflow requires. This reduces the attack surface for prompt injection. A webpage with hidden text instructing Claude to visit a malicious URL can't succeed if outbound connections are limited to your known-good domains.

Anthropic adds automatic protection through prompt injection classifiers. These run on every computer use request and analyze screenshots for signs of injected instructions. When a classifier flags something suspicious, Claude pauses and asks for human confirmation before proceeding. This runs by default on all computer use deployments. If your use case is fully automated with no human in the loop, contact Anthropic support to discuss adjusting this behavior.

Even with classifiers active, require human approval for consequential actions. Financial transactions, terms-of-service agreements, account creation, and anything involving personal data should not run autonomously. Claude can follow instructions found in on-screen content, even when those instructions conflict with your system prompt. This is inherent to vision-based interaction, not a defect.

For audit and compliance, capture the full sequence of screenshots and actions from every session. This creates a reviewable record of everything Claude did and why. Upload session logs to a shared workspace where reviewers can replay the sequence. Fast.io's audit trail logs file operations automatically, and you can enable Intelligence Mode to index session recordings for later search and retrieval.

Start with low-risk, repeatable tasks: data entry on known forms, screenshot-based testing, and structured document processing. Build confidence in Claude's behavior within your specific environment before expanding to more complex or sensitive workflows.

Claude Computer Use: How It Works and How to Set It Up

How the Screenshot-Action Loop Works

Supported Models and Beta Headers

How to Set Up Computer Use via the API

Persist your computer use outputs in one shared workspace

Computer Use in Claude Desktop and Cowork

Safety Measures for Production

Frequently Asked Questions

Related Resources

Persist your computer use outputs in one shared workspace