AI & Agents

How to Set Up OpenClaw with Google Gemini Models

OpenClaw's Gemini integration connects your agent to Google's full model family, including Gemini 3.1 Pro for chat, Flash for image generation, Veo 3.1 for video, and Lyria 3 for music synthesis. This guide covers both authentication methods, model routing, multimodal capabilities, cache optimization, and how to persist your agent's outputs in a shared workspace.

Fast.io Editorial Team 9 min read
AI agent workspace with file sharing and collaboration tools

How OpenClaw's Gemini Provider Works

OpenClaw treats model providers as interchangeable routing layers. The Gemini provider handles authentication with Google's APIs and routes requests to the correct model endpoint based on the google/* prefix in your configuration.

Two authentication methods are available, and which one you pick affects billing, rate limits, and feature access.

API Key (Google AI Studio)

  • Pay per token through the Google AI Studio billing model
  • Works with every Gemini model: chat, image generation, video, music, TTS, and voice
  • Free tier includes 60 requests per minute and 1,000 requests per day
  • Best for production agents or developers who want full model access without subscription constraints

Gemini CLI OAuth

  • Authenticates through the local gemini CLI using PKCE OAuth
  • Ties usage to your Google account rather than a separate API key
  • No separate billing dashboard to manage
  • The google-gemini-cli provider is an unofficial integration, and some users report account restrictions when using OAuth this way

The practical difference: API key auth is straightforward and gives you explicit control over billing and rate limits. Gemini CLI OAuth avoids managing a separate key but comes with the caveat that it's an unofficial path. For most agent deployments, the API key approach is more reliable.

Gemini's model catalog goes well beyond chat. The same provider handles image generation, text-to-video, music synthesis, text-to-speech, and real-time bidirectional voice. That means a single OpenClaw agent can generate a blog post, create an illustration for it, and produce a narrated audio version without switching providers or reconfiguring authentication.

Setting Up Gemini Authentication

API Key Setup

Create a key in Google AI Studio. Give it a name you'll recognize later, like "OpenClaw Production," and copy it immediately.

Run the onboarding command:

openclaw onboard --auth-choice gemini-api-key

You can also set the key directly as an environment variable:

export GEMINI_API_KEY="your-key-here"

OpenClaw accepts both GEMINI_API_KEY and GOOGLE_API_KEY. If both are set, GEMINI_API_KEY takes priority.

For non-interactive environments like CI pipelines or Docker containers, pass the key directly:

openclaw onboard --auth-choice gemini-api-key --api-key "your-key-here"

Gemini CLI OAuth Setup

Install the Gemini CLI through Homebrew or npm, then authenticate:

openclaw models auth login --provider google-gemini-cli

This opens a browser window for the OAuth flow. On headless servers, add the device-code flag to get a URL and one-time code you can approve from any device.

Verifying the Connection

After configuring either method, confirm your setup:

openclaw status

This shows whether your gateway and Gemini provider are correctly configured. For deeper diagnostics:

openclaw models status
openclaw doctor --fix

The doctor command repairs common issues like stale provider references. If configuration problems persist after an OpenClaw upgrade, openclaw config validate checks your entire configuration file for deprecated or malformed entries.

Configuration audit trail showing API key setup and validation

Available Models and Capabilities

OpenClaw routes all Gemini models through the google/* prefix. The model you specify determines which capabilities your agent can access.

Chat Models

  • google/gemini-3.1-pro-preview is the default. It supports reasoning, tool use, and multimodal input including images, audio, and video
  • google/gemini-2.5-pro offers strong reasoning at a lower latency than the 3.1 generation
  • google/gemini-2.5-flash is the cost-optimized option with a generous free tier

Gemma 4 models like gemma-4-26b-a4b-it also work through the Gemini provider with thinking mode enabled, if you need an open-weight alternative for specific tasks.

Set your default model in the configuration:

{
  agents: {
    defaults: {
      model: { primary: "google/gemini-3.1-pro-preview" }
    }
  }
}

Image Generation

google/gemini-3.1-flash-image-preview and google/gemini-3-pro-image-preview handle text-to-image requests. Each request can generate up to four images, and the edit mode accepts up to five input images for iterative refinement.

Video Generation

google/veo-3.1-fast-generate-preview produces 4 to 8 second clips from text descriptions, still images, or reference video. This is where OpenClaw's Gemini integration pulls ahead of most third-party guides, which typically stop at API key setup and miss the multimodal capabilities entirely.

Music Generation

google/lyria-3-clip-preview and google/lyria-3-pro-preview generate music from text prompts. Both models support lyrics and instrumental controls, with output in MP3 or WAV format. The pro variant produces longer, higher-fidelity clips.

Text-to-Speech

gemini-3.1-flash-tts-preview converts text to natural-sounding audio. It outputs WAV for file attachments, Opus for voice notes, and PCM for telephony integrations. The model supports expressive tags within [[tts:text]] blocks, so you can add natural emphasis or whispered sections to your agent's speech output without those instructions appearing in the chat text.

Real-Time Voice

For bidirectional audio conversations, the Google Live API powers real-time voice through gemini-2.5-flash-native-audio-preview-12-2025. Configure it under plugins.entries.voice-call.config.realtime.providers.google with VAD sensitivity tuning, session resumption, and context window compression. This enables your agent to hold actual spoken conversations, not just generate audio files.

Web Search Grounding

Gemini Grounding lets your agent pull live web results into its responses. Credentials follow a priority chain: dedicated webSearch.apiKey first, then GEMINI_API_KEY, then models.providers.google.apiKey. No separate search API subscription is required.

AI model routing and neural network indexing visualization
Fastio features

Give Your Gemini Agent Persistent File Storage

OpenClaw agents running Gemini models generate images, video, music, and documents that need to outlast the session. Fast.io gives you 50GB of free storage with built-in semantic search, MCP access, and one-click handoff to your team. No credit card required.

Cache Optimization and Thinking Mode

Two features separate a basic Gemini setup from one tuned for production: cached content and thinking mode. Getting both right reduces cost and improves reasoning quality on complex tasks.

Gemini Cache Integration

When running against the direct API, OpenClaw accepts cachedContent handles. These are prebuilt context blocks (like cachedContents/prebuilt-context) that Google caches server-side so your agent doesn't resend the same large context on every request. This is particularly useful for agents that repeatedly reference the same documents, codebases, or knowledge bases.

OpenClaw normalizes Gemini's cache-hit usage into the standard cacheRead metric from the upstream cachedContentTokenCount field. This means your cost tracking dashboard shows cache performance consistently across providers. If your agent processes the same reference material across sessions, cached content can significantly reduce token costs.

Thinking Mode Configuration

Gemini 2.5 and 3.x models support explicit reasoning steps. OpenClaw maps reasoning controls to thinkingLevel rather than thinkingBudget, which matters because sending a disabled budget value to Gemini models causes errors.

The /think adaptive command preserves Google's dynamic thinking semantics, letting the model decide how much reasoning each prompt needs. For tasks where you want consistent deep reasoning, set a fixed thinking level instead.

For Gemini 3 and 3.1 models specifically, OpenClaw handles the mapping automatically. You don't need to remember which parameter name Gemini expects. Just configure thinking in your OpenClaw settings and the provider translates it to the correct API format.

Context Window Sizing

Gemini 3.1 Pro supports a 1M token context window. Like the OpenAI provider, you can cap the runtime context to keep latency and cost under control:

{
  models: {
    providers: {
      google: {
        models: [{ id: "gemini-3.1-pro-preview", contextTokens: 200000 }]
      }
    }
  }
}

A 200,000-token cap handles most agent workloads comfortably. Raise it when your agent needs to ingest large document sets in a single session.

Persisting and Sharing Your Agent's Output

A Gemini-powered OpenClaw agent can generate text, images, video clips, and audio files in a single session. The question is where all that output goes when the session ends.

Local filesystems work during development but break when containers restart or when someone else needs the files. Google Cloud Storage adds durability without collaboration. Google Drive gives you sharing but wasn't designed for programmatic agent workflows where files need indexing, search, and structured handoff.

Fast.io works as a persistent workspace layer for OpenClaw agents. Your agent writes files to a workspace through the MCP server or REST API, and teammates see those files immediately in the browser.

What this looks like in practice for Gemini agent workflows:

  • MCP-native access: Fast.io exposes Streamable HTTP at /mcp and legacy SSE at /sse. Your OpenClaw agent can read, write, search, and manage workspace files through 19 consolidated MCP tools
  • Built-in Intelligence: Enable Intelligence on a workspace and every uploaded file gets automatically indexed for semantic search and citation-backed chat. Your agent generates a Veo video and a Lyria audio track, uploads both, and your team can search across all of it without configuring a separate vector database
  • Ownership transfer: Your agent builds a workspace with organized outputs, transfers the organization to a client, and keeps admin access for ongoing work
  • Metadata Views: For agents that generate structured outputs like research reports or media libraries, Metadata Views let you extract fields from uploaded documents into a sortable, filterable spreadsheet. Describe the schema in natural language and Fast.io handles the extraction
  • Free agent tier: 50GB storage, 5,000 credits per month, 5 workspaces. No credit card, no trial, no expiration. Get started free

For teams already running OpenClaw with Gemini, adding the Fast.io MCP server to your agent's tool configuration takes minutes. Authenticate once, point uploads at a shared workspace, and every collaborator gets immediate access to your agent's generated content.

AI-powered workspace with semantic search and document intelligence

Frequently Asked Questions

How do I connect OpenClaw to Google Gemini?

Run `openclaw onboard --auth-choice gemini-api-key` and provide your Google AI Studio API key. This configures the Gemini provider and lets you reference models with the `google/*` prefix. Verify the connection with `openclaw status`. Alternatively, authenticate via the Gemini CLI OAuth path using `openclaw models auth login --provider google-gemini-cli`.

Which Gemini models work with OpenClaw?

OpenClaw supports the full Gemini model family through the google/* routing prefix. Chat models include gemini-3.1-pro-preview (default), gemini-2.5-pro, and gemini-2.5-flash. Specialized models cover image generation (gemini-3.1-flash-image-preview), video generation (veo-3.1-fast-generate-preview), music (lyria-3-clip-preview, lyria-3-pro-preview), text-to-speech (gemini-3.1-flash-tts-preview), and real-time voice.

Can OpenClaw generate images with Gemini?

Yes. Use google/gemini-3.1-flash-image-preview or google/gemini-3-pro-image-preview in your model configuration. Each request can generate up to four images, and the edit mode accepts up to five input images for iterative refinement.

Does OpenClaw support Gemini video and music generation?

OpenClaw routes video generation through google/veo-3.1-fast-generate-preview, which produces 4 to 8 second clips from text, images, or reference video. Music generation uses google/lyria-3-clip-preview or lyria-3-pro-preview with lyrics and instrumental controls, outputting MP3 or WAV.

How does Gemini caching work in OpenClaw?

OpenClaw accepts cachedContent handles when running against the direct Gemini API. These are prebuilt context blocks that Google caches server-side. Cache-hit usage is normalized into OpenClaw's standard cacheRead metric from the upstream cachedContentTokenCount field, giving you consistent cost tracking across providers.

What is the difference between API key and Gemini CLI OAuth in OpenClaw?

API key authentication bills per token through Google AI Studio with explicit rate limit control. Gemini CLI OAuth ties usage to your Google account via PKCE, avoiding a separate key. The CLI OAuth path is an unofficial integration and some users report account restrictions, so the API key method is generally more reliable for production deployments.

Related Resources

Fastio features

Give Your Gemini Agent Persistent File Storage

OpenClaw agents running Gemini models generate images, video, music, and documents that need to outlast the session. Fast.io gives you 50GB of free storage with built-in semantic search, MCP access, and one-click handoff to your team. No credit card required.