AI & Agents

Anthropic API Pricing in 2026: Costs Per Model and How to Optimize

Anthropic charges per-token with no monthly minimum across three active model tiers: Opus at $5/$25 per million tokens, Sonnet at $3/$15, and Haiku at $1/$5. Prompt caching cuts cached input reads to 10% of the standard rate, and the batch API halves all token costs, allowing combined savings of up to 95% on input tokens. The gap between list price and optimized pricing explains why two teams running identical workloads can see ten-fold differences in their monthly bills.

Fast.io Editorial Team 10 min read
Abstract neural network visualization representing AI model token processing

How Token-Based API Pricing Works

Cache reads on the Anthropic API cost one-tenth of the standard input price, according to Anthropic's official pricing documentation. Stack that discount with the batch API's flat 50% reduction and input tokens on Haiku 4.5 drop from $1.00 to $0.05 per million. That 95% gap between sticker and optimized pricing is the single biggest variable in your monthly bill, and most pricing overviews skip it entirely.

The Claude API charges per-token with no monthly minimum and no commitment. You create an API key, send requests, and pay for the tokens consumed. Every request has two billable components: input tokens (your prompt, system instructions, conversation history, and any cached content) and output tokens (the model's response, including any extended thinking).

Across every current Claude model, output tokens cost exactly five times more than input tokens. This 5:1 ratio shapes cost optimization in ways that are easy to overlook. A customer support bot that sends a 500-token query but generates 2,000-token responses spends 80% of its budget on output. A classification pipeline that sends 5,000 tokens and receives a 50-token label spends 99% on input. Knowing which side of the ratio dominates your workload tells you which optimization lever matters most: prompt caching for input-heavy jobs, or model downsizing for output-heavy ones.

Whether you're running Claude Cowork sessions for shared AI development, building a production chatbot, or processing documents at scale, the billing model stays the same. There are no per-seat fees, no per-request charges, and no volume tiers that change your per-token cost. Everyone pays the same rate for a given model regardless of usage volume, though Anthropic does negotiate custom enterprise pricing for very high-volume customers.

New accounts get a small amount of free credits for testing. After those run out, billing is monthly based on actual usage.

Per-Model Pricing for Every Current Claude Model

Anthropic offers four distinct model families at different price and capability points. All prices below are per million tokens (MTok) as published on Anthropic's official pricing page, current as of June 2026.

Flagship Tier

Claude Fable 5: $10 input / $50 output per MTok. Anthropic's most capable model, released June 9, 2026. 1M token context window with 128K max output. Built for the most demanding reasoning and long-horizon agentic tasks.

Opus Tier

Claude Opus 4.8: $5 input / $25 output per MTok. Top of the Opus tier for complex reasoning and agentic coding. 1M context, 128K max output.

  • Claude Opus 4.7: $5 input / $25 output per MTok. Same pricing and 1M context as 4.8. Uses the newer tokenizer introduced with this generation.

  • Claude Opus 4.6: $5 input / $25 output per MTok. 1M context, 128K max output. Supports extended thinking.

Sonnet Tier

Claude Sonnet 4.6: $3 input / $15 output per MTok. The production workhorse with the strongest price-to-performance ratio for most applications. 1M context, 64K max output.

Haiku Tier

Claude Haiku 4.5: $1 input / $5 output per MTok. Fastest and cheapest model. 200K context, 64K max output. Ideal for classification, routing, and high-volume lightweight tasks.

What the Price List Leaves Out

Tokenizer differences affect real costs. Opus 4.7 and later models use a newer tokenizer that produces up to 35% more tokens for the same input text compared to Opus 4.6 and earlier. Per-token rates are identical, but the effective cost per request increases when you migrate from 4.6 to 4.7. The Opus 4.7 to 4.8 upgrade carries no additional tokenizer penalty, making it a free performance improvement.

Full context at flat rates. Fable 5, Opus 4.8, 4.7, 4.6, and Sonnet 4.6 all support the full 1M token context window with no long-context surcharge. A 900,000-token request costs the same per-token rate as a 9,000-token request.

Legacy models cost significantly more. The deprecated Opus 4.1 charged $15/$75 per MTok, three times the current Opus rate. Migrating from any pre-4.5 model to a current-generation Opus saves 67% immediately with better performance.

Prompt Caching: 90% Off Repeated Input

Prompt caching is the highest-impact cost optimization Anthropic offers. When the same content appears across multiple API calls, including system prompts, document chunks, and conversation prefixes, caching stores it server-side and serves it at one-tenth of the standard input price on subsequent requests.

Two Cache Durations

Anthropic provides two time-to-live options with different write costs:

  • 5-minute cache: Write cost is 1.25x the base input price. Cache reads cost 0.1x base. Breaks even after a single cache read, since 1.25x + 0.1x = 1.35x total versus 2x for sending the same content twice without caching.

  • 1-hour cache: Write cost is 2x the base input price. Same 0.1x read cost. Breaks even after two reads, since 2x + 0.1x + 0.1x = 2.2x total versus 3x for three uncached calls.

For Claude Sonnet 4.6, those multipliers translate to:

  • Standard input: $3.00/MTok
  • 5-minute cache write: $3.75/MTok
  • 1-hour cache write: $6.00/MTok
  • Cache read (either duration): $0.30/MTok

How Caching Cuts 85% From a RAG Workload

Consider a retrieval-augmented generation app with a 50,000-token knowledge base queried 1,000 times daily on Sonnet 4.6.

Without caching: 50,000 tokens per query multiplied by 1,000 queries multiplied by 30 days equals 1.5 billion input tokens per month. At $3/MTok, that is $4,500/month just for input.

With 1-hour caching: The knowledge base writes to cache roughly 24 times per day (once per hour-long window). Cache writes: 36 million tokens at $6/MTok = $216. Cache reads: 976 daily queries multiplied by 50,000 tokens multiplied by 30 days = 1.464 billion tokens at $0.30/MTok = $439. Monthly input total: approximately $655, an 85% reduction from $4,500.

Setting Up Caching

The simplest path is automatic caching with a single cache_control parameter at the top level of your API request. The system manages cache breakpoints as conversations grow. For finer control, place cache_control on individual content blocks to specify exactly what gets cached.

Caching works with every current Claude model and stacks with the batch API discount. A cached batch request on Haiku 4.5 brings input costs down to $0.05/MTok, a 95% reduction from the standard $1.00 rate.

AI-powered document analysis and cost optimization visualization

Batch API, Fast Mode, and Combined Discounts

Batch Processing at Half Price

The Batch API processes requests asynchronously within a 24-hour window at a flat 50% discount on all token costs. There is no quality difference between batch and synchronous responses. The same model runs the same inference; Anthropic just schedules the work during off-peak capacity.

Batch rates for the most-used models:

  • Opus 4.8: $2.50 input / $12.50 output per MTok
  • Sonnet 4.6: $1.50 input / $7.50 output per MTok
  • Haiku 4.5: $0.50 input / $2.50 output per MTok

Any workload that does not need real-time responses belongs in the batch queue: nightly report generation, content moderation backlogs, bulk classification, and scheduled data extraction.

Fast Mode for Opus

Fast mode delivers faster output from Opus models at a premium:

  • Opus 4.8 Fast Mode: $10 input / $50 output per MTok (2x standard Opus pricing)
  • Opus 4.7 and 4.6 Fast Mode: $30 input / $150 output per MTok (6x standard)

The Opus 4.8 fast mode price is a significant improvement from the 4.7 generation. If you're paying for fast mode on Opus 4.7, upgrading to 4.8 cuts that cost by two-thirds with no quality loss. Note that fast mode is not available through the batch API.

Stacking Batch and Cache Discounts

The batch API and prompt caching multipliers compound multiplicatively. Here is what stacked discounts look like for input tokens:

  • Haiku 4.5 batch + cache read: $1.00 x 0.5 x 0.1 = $0.05/MTok (95% off list)
  • Sonnet 4.6 batch + cache read: $3.00 x 0.5 x 0.1 = $0.15/MTok (95% off list)
  • Opus 4.8 batch + cache read: $5.00 x 0.5 x 0.1 = $0.25/MTok (95% off list)

For a document processing pipeline running 100,000 documents through Haiku 4.5 with cached instructions via the batch API, stacking both discounts turns a $5,000/month workload into roughly $300.

Fastio features

50GB of free agent storage with no credit card

Fast.io gives every agent account persistent workspaces, an MCP server at /mcp, and 5,000 monthly credits. Set up takes under two minutes.

What Chatbots, RAG Systems, and Claude Cowork Sessions Cost

Per-token rates are useful for comparing models, but monthly budgets need workload-specific math. Here are three common scenarios calculated with June 2026 pricing.

Document Classification Pipeline

Setup: 100,000 documents per month. Average 3,000 input tokens per document. Output: 100-token classification label.

Haiku 4.5 via batch API:

  • Input: 300 million tokens at $0.50/MTok = $150
  • Output: 10 million tokens at $2.50/MTok = $25
  • Monthly total: $175, or about $0.002 per document

At under two-tenths of a cent per document, the API cost is negligible compared to the engineering time to build the pipeline. Haiku with batch processing is the natural fit for any high-volume, low-complexity task.

Customer Support Chatbot

Setup: 5,000 conversations per day. Average 1,500 input tokens (including a 1,000-token cached system prompt) and 800 output tokens per conversation.

Haiku 4.5 with 1-hour caching on the system prompt:

  • Non-cached input: 75 million tokens/month at $1/MTok = $75
  • Cached system prompt reads: 150 million tokens/month at $0.10/MTok = $15
  • Cache writes: approximately 22 million tokens/month at $2/MTok = $44
  • Output: 120 million tokens/month at $5/MTok = $600
  • Monthly total: approximately $734

Sonnet 4.6 with the same caching pattern: approximately $2,150/month. For most support chatbots, Haiku handles the volume and quality requirements well. Sonnet is worth the premium when conversations involve nuanced reasoning or multi-step problem solving.

Claude Cowork and Agentic Coding Sessions

Setup: 200 coding sessions per day. Average 12,000 input tokens (code context, system prompt, conversation history) and 5,000 output tokens per session.

Opus 4.8:

  • Input: 72 million tokens/month at $5/MTok = $360
  • Output: 30 million tokens/month at $25/MTok = $750
  • Monthly total: approximately $1,110

Sonnet 4.6:

  • Input: 72 million tokens/month at $3/MTok = $216
  • Output: 30 million tokens/month at $15/MTok = $450
  • Monthly total: approximately $666

Agent sessions also need somewhere to store the files and artifacts created between sessions. Local storage works for prototyping, but production agent workflows benefit from persistent cloud storage. Fast.io provides a free agent plan with 50GB of storage, 5,000 monthly credits, and an MCP server for programmatic access, so you can avoid adding another line item for storage. Other options include Amazon S3 for raw object storage or Google Drive for lighter collaboration needs.

AI chat interface showing document query and response with cost-relevant context

Six Ways to Reduce Your Claude API Spend

1. Match the Model to the Task

Not every request needs the most expensive model. A routing layer where Haiku classifies incoming requests and forwards only complex ones to Sonnet or Opus can cut overall costs by 60-70%. The quality gap between Sonnet 4.6 and Opus 4.8 is smaller than the 40% pricing difference suggests for most production workloads, including summarization, content generation, and standard coding tasks.

2. Cache Every Repeated Context

Any content reused across requests should have a cache_control annotation: system prompts, document chunks, few-shot examples, and long conversation prefixes all qualify. The 5-minute TTL works for interactive sessions where the same context gets reused within a few minutes. The 1-hour TTL is better for steady production traffic where a knowledge base or system prompt gets hit hundreds of times per hour.

3. Batch Everything That Can Wait

Nightly report generation, moderation queues, content extraction, and any workload tolerant of 24-hour latency should run through the batch API. The 50% discount applies to both input and output tokens with identical quality.

4. Watch the Tokenizer When Upgrading

If you're migrating from Opus 4.6 to 4.7 or later, benchmark your actual token counts before and after. The newer tokenizer can produce 35% more tokens for the same text, increasing per-request costs despite identical per-token rates. Moving from 4.7 to 4.8 carries no additional penalty, so that upgrade is straightforward.

5. Send Less Context Per Request

Retrieval-augmented generation reduces the tokens you send per API call. Instead of packing full documents into every request, a retrieval step pulls only the relevant chunks first. You can build this yourself with a vector database like Pinecone or Qdrant, or use a platform with built-in RAG. Fast.io's Intelligence Mode auto-indexes workspace files for semantic search and citation-backed chat without a separate vector store. LangChain and LlamaIndex offer similar retrieval pipelines for self-hosted setups.

6. Monitor Usage and Set Alerts

Track token consumption by model, feature, and team. The Claude Console provides usage dashboards, and every API response includes token counts in the usage field. Set spending alerts before a prototype accidentally runs up a production bill. Request rate limit increases proactively if you're approaching your tier's capacity, since throttled retries waste tokens and time.

Frequently Asked Questions

How much does the Claude API cost?

Pricing depends on the model tier. Claude Haiku 4.5 costs $1 per million input tokens and $5 per million output tokens. Sonnet 4.6 runs $3/$15, Opus 4.8 is $5/$25, and the flagship Fable 5 is $10/$50. Prompt caching reduces cached input to 10% of the standard rate, and the batch API cuts all token costs by 50%.

Is the Anthropic API free?

Not on an ongoing basis. New accounts receive a small amount of free credits for initial testing. After those credits are used, all usage is billed monthly based on token consumption. There is no permanent free tier, though the batch API and prompt caching offer substantial discounts for production workloads.

Which Claude model is cheapest?

Claude Haiku 4.5 is the cheapest at $1 per million input tokens and $5 per million output tokens. Using the batch API drops those rates to $0.50 and $2.50 respectively. For classification, routing, and extraction tasks, Haiku delivers strong results at a fraction of Sonnet or Opus pricing.

How do I reduce Claude API costs?

Three strategies have the biggest impact. Enable prompt caching for any content reused across requests, which cuts cached input costs by 90%. Route non-urgent work through the batch API for a flat 50% discount. Use Haiku for simple tasks instead of running everything through Sonnet or Opus. Stacking caching and batch discounts together can save up to 95% on input tokens.

Does prompt caching work with the batch API?

Yes. The discounts stack multiplicatively. A cached batch request on Haiku 4.5 costs $0.05 per million input tokens, compared to $1.00 at the standard rate. Both the 5-minute and 1-hour cache durations work with batch processing.

What is the difference between Opus, Sonnet, and Haiku?

Opus ($5/$25 per MTok) is the most capable tier for complex reasoning, long-horizon planning, and agentic coding. Sonnet ($3/$15 per MTok) balances speed and intelligence for production workloads like chatbots, content generation, and standard coding tasks. Haiku ($1/$5 per MTok) is the fastest and cheapest, built for classification, routing, and high-volume lightweight tasks.

How does extended thinking affect API pricing?

Extended thinking tokens are billed as output tokens at the standard rate for the model you are using. Since output tokens cost five times more than input across all Claude models, extended thinking can increase per-request costs noticeably for reasoning-heavy tasks. Check the thinking token count in your API responses to track its impact on your bill.

Related Resources

Fastio features

50GB of free agent storage with no credit card

Fast.io gives every agent account persistent workspaces, an MCP server at /mcp, and 5,000 monthly credits. Set up takes under two minutes.