AI & Agents

How to Implement LLM Tool Calling: A Developer's Guide

LLM tool calling allows AI models to execute code, query databases, and manage files instead of just generating text. This guide covers how to implement tool use across major providers, essential design patterns for reliability, and how to connect agents to persistent storage.

Fast.io Editorial Team 6 min read
Tool calling lets models take real-world actions, not just generate text.

What Is LLM Tool Calling?

LLM tool calling (also called function calling) is the ability of large language models to invoke external tools, APIs, and services during a conversation, allowing them to take actions like reading files, querying databases, or managing cloud storage rather than just generating text. Before tool calling, LLMs were isolated "brains in a jar." They could reason but couldn't interact with anything. Now, all major providers (OpenAI, Anthropic, Google) support native tool calling with much higher accuracy in recent versions like GPT-4o and Claude 3.5 Sonnet.

Tool Calling vs. Function Calling

While often used interchangeably, there is a subtle distinction:

  • Function Calling: The specific mechanism where a model outputs a structured JSON object matching a function signature you provided.
  • Tool Calling: The wider ability of an agent to perceive, select, and execute these functions to achieve a goal.

Helpful references: Fast.io Workspaces, Fast.io Collaboration, and Fast.io AI.

AI interface showing structured data analysis and function outputs

What to check before scaling llm tool calling

Not all models handle tools equally. Choosing the right provider depends on your need for speed, complexity, and context window size.

OpenAI (GPT-4o)

OpenAI set the standard with its function calling API. It excels at parallel tool calling, allowing the model to request multiple actions in a single turn (e.g., "get weather for NY and SF").

  • Strengths: High reliability, native JSON mode, parallel execution.
  • Best For: Complex workflows requiring strictly formatted outputs.

Anthropic (Claude 3.5 Sonnet)

Claude performs best for coding and complex reasoning. Its tool use capabilities handle large context windows and ambiguous instructions better than competitors.

  • Strengths: Large context window (200k), fewer hallucinations in parameter selection, good at "computer use" (controlling GUIs).
  • Best For: Agents that need to read large documentation files or navigate complex codebases.

Google (Gemini 1.5 Pro)

Gemini's 2 million token context window lets you include entire API docs in the prompt for the model to learn tools on the fly.

  • Strengths: Large context, native integration with Google Workspace tools.
  • Best For: Analyzing large datasets or video files using tools.

Open Source (Llama 3 & Mistral)

For developers prioritizing privacy or cost, open-source models now handle tool use surprisingly well.

  • Strengths: Runs locally, no data privacy concerns, zero API costs.
  • Best For: Internal enterprise tools handling sensitive PII (Personally Identifiable Information).
  • Tooling: Frameworks like Ollama and vLLM now support OpenAI-compatible tool calling endpoints, making it easy to swap a cloud model for a local Llama 3 instance without rewriting your code.
Fast.io features

Give Your AI Agents Persistent Storage

Stop building agents with amnesia. Fast.io provides free, persistent cloud storage and a pre-built MCP server so your agents can read, write, and remember.

Advanced Tool Calling Patterns

Most documentation only shows you how to call get_weather(). Real-world production agents need reliable patterns to handle failures and complex sequences.

1. The ReAct Pattern (Reason + Act)

Instead of just calling a tool, the model should first output a "Thought" explaining why it is calling the tool. This improves accuracy. * Thought: "User wants to summarize a PDF. I need to first list files to find it, then read it."

  • Action: list_files(directory="/documents")
  • Observation: (System returns file list)

2. Tool Chaining

Complex tasks often require the output of one tool to be the input of another. For example, an agent might need to:

  1. search_files(query="invoice") to get a file ID. 2. read_file(id=...) to get the content. 3. send_email(to=..., body=...) to forward it.

3. Error Recovery Loops

Models will occasionally hallucinate parameters or use invalid types. A reliable system feeds the error message back to the model:

  • Model: Calls get_user(id="123") (String)
  • System: Error: id must be an Integer.
  • Model: "Apologies, retrying." Calls get_user(id=123) (Int)

This "self-correction" loop is critical for autonomous agents.

Connecting Agents to File Systems

One of the most common use cases for tool calling is file management, including reading, writing, and analyzing documents. However, giving an agent raw access to your local C:/ drive is dangerous and doesn't scale to cloud deployments.

The Fast.io Approach: Persistent Cloud Storage for Agents

Fast.io provides a dedicated filesystem designed for AI agents. Instead of ephemeral containers that lose data when the session ends, Fast.io offers persistent cloud storage that your agents can access via standard tools or the Model Context Protocol (MCP).

  • Standardized Tools: Use the Fast.io MCP server to give Claude or custom agents instant list, read, write, and search capabilities.
  • Security: Agents work within isolated workspaces, preventing accidental access to sensitive system files.
  • Collaboration: Files created by an agent are immediately visible to humans via the Fast.io web interface, making handoffs easier. Using a dedicated agent storage layer reduces implementation time for file-handling agents compared to building custom S3 wrappers.
Audit log interface showing history of AI agent file operations

Best Practices for Production Tool Use

To move from prototype to production, follow these security and reliability guidelines.

Limit Tool Scope

Don't give a general-purpose agent a delete_database() tool. Create specialized agents with narrow permissions (e.g., a "Read-Only Researcher" vs. an "Admin Manager").

Use Human-in-the-Loop

For sensitive actions (like sending emails or deleting files), configure your tool executor to pause and require human approval before running the final API call.

Optimize Context Usage

Tools consume tokens. If you have 50 tools, don't pass the full JSON schema for all of them in every turn. Use retrieval to inject only the relevant tool definitions into the system prompt based on the user's query.

Handle API Timeouts

LLMs are slow, and real-world tools can be even slower. Make sure your orchestration layer (like LangChain or a custom loop) has timeout handling. If a tool takes too long to run, the user shouldn't be left staring at a blank screen. Send intermediate status updates ("I'm searching for the file now...") to improve the experience.

Frequently Asked Questions

What is tool calling in LLMs?

Tool calling is a capability that allows Large Language Models (LLMs) to interact with external APIs and execute code. Instead of just generating text, the model can output structured commands to perform tasks like searching the web, querying databases, or managing files.

What is the difference between tool calling and function calling?

The terms are often used interchangeably, but function calling typically refers to the specific mechanism of outputting structured JSON to match a function signature, while tool calling describes the broader capability of an agent to use these functions to solve problems.

Which LLMs support tool calling?

All major frontier models support tool calling, including OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, Google's Gemini 1.5 Pro, and open-source models like Llama 3 via frameworks like Ollama.

Does tool calling cost more than regular text generation?

Yes. Defining tools consumes context window tokens because the tool schemas are injected into the system prompt. Also, the multi-step 'thought-action-observation' loop requires more round-trips to the LLM than a simple Q&A, increasing total token usage per task.

How do I secure my AI agent's tool access?

Secure agents by implementing 'least privilege' access. Use read-only tokens where possible, sandbox execution environments (like Fast.io's isolated workspaces), and require human confirmation for destructive actions.

Related Resources

Fast.io features

Give Your AI Agents Persistent Storage

Stop building agents with amnesia. Fast.io provides free, persistent cloud storage and a pre-built MCP server so your agents can read, write, and remember.