AI & Agents

How to Build an OpenClaw Pan-Tilt Object Tracking Camera on Raspberry Pi

A pan-tilt tracking camera pairs servo motors with a PiCamera on a Raspberry Pi so the camera physically follows detected objects. Adding OpenClaw as the agent layer lets you go beyond basic centroid tracking, using LLM-driven decisions about which object to prioritize, when to zoom, and whether an event is worth recording. This guide covers the hardware, wiring, detection pipeline, and cloud storage for captured footage.

Fast.io Editorial Team 12 min read
Raspberry Pi with pan-tilt camera mount tracking an object

Why Basic Centroid Tracking Hits a Wall

Most Raspberry Pi pan-tilt camera projects follow the same pattern. OpenCV detects an object, calculates its centroid coordinates, and a PID controller adjusts two servos to keep that centroid centered in the frame. It works, and for single-object tracking in controlled environments it works well.

The problems start when the scene gets complicated. Two people walk through the frame and the tracker picks whichever bounding box the detection model returns first. A cat crosses in front of a parked car and the camera swings to follow the cat, losing sight of the driveway. The system has no concept of priority, context, or significance. It tracks whatever it sees, in the order it sees it.

OpenClaw changes this by adding an agent decision layer between detection and servo control. Instead of feeding centroid coordinates directly to a PID loop, the detection results pass through an LLM that can reason about what it sees. The agent decides which object matters most, whether the current tracking target is still relevant, and whether the event is worth recording to cloud storage.

This is the difference between a tracking camera and a tracking camera agent. The camera follows objects. The agent decides which objects are worth following.

Hardware: Pi, Pan-Tilt, and Camera

The parts list splits into three groups: the Pi itself, the pan-tilt mechanism, and the camera and detection hardware.

Raspberry Pi:

  • Raspberry Pi 5 (8 GB RAM) for the best performance, or Raspberry Pi 4 (4 GB minimum)
  • High-endurance microSD card, 32 GB or larger
  • USB-C power supply rated at 5V/5A for Pi 5 (the servos draw extra current)

Pan-tilt mechanism:

  • Waveshare 2-DOF Pan-Tilt HAT with onboard PCA9685 PWM driver and I2C interface, which includes two servo motors (SG90 and MG90S options) for under $25
  • Alternative: Arducam Pan Tilt Platform with digital servos and PTZ control board
  • Alternative: SunFounder Pan-Tilt Hat v3.0 with 9G digital servo

The Pimoroni Pan-Tilt HAT was the original go-to option for Raspberry Pi camera tracking projects, but Pimoroni has discontinued it. The Waveshare 2-DOF HAT is the closest current replacement. It uses the PCA9685 chip for PWM generation over I2C, which means it only needs two GPIO pins (SDA and SCL) regardless of how many servos you connect. Both pan and tilt axes cover 180 degrees of motion.

Camera and detection:

  • Raspberry Pi Camera Module 2 or Camera Module 3
  • Optional: Raspberry Pi AI HAT+ with Hailo-8L (13 TOPS) or Hailo-8 (26 TOPS) for hardware-accelerated object detection at up to 30fps

Without the AI HAT+, you can run lighter models like MobileNet SSD or a small YOLOv8 variant directly on the Pi's CPU. Detection will run at 5 to 10 frames per second on a Pi 5, which is fast enough for tracking people or vehicles that move at walking speed. The Hailo accelerator pushes that to 30fps at 1080p, which matters for faster-moving subjects or scenes where you need smooth servo response.

Total cost ranges from $80 (Pi 4 + Waveshare HAT + Camera Module 2, no AI accelerator) to around $200 (Pi 5 + Waveshare HAT + Camera Module 3 + Hailo-8L AI HAT+).

AI agent processing captured tracking data in a cloud workspace

Wiring and Assembly

The Waveshare 2-DOF Pan-Tilt HAT plugs directly onto the Pi's 40-pin GPIO header. If you are using a different pan-tilt kit that does not come as a HAT, the wiring is straightforward.

I2C connection for PCA9685-based kits:

  • SDA to GPIO 2 (pin 3)
  • SCL to GPIO 3 (pin 5)
  • VCC to 5V (pin 2 or 4)
  • GND to ground (pin 6, 9, 14, 20, 25, 30, 34, or 39)

Direct servo connection (no HAT, using GPIO PWM):

  • Pan servo signal wire to a PWM-capable GPIO pin (GPIO 12 or GPIO 13)
  • Tilt servo signal wire to a second PWM pin
  • Servo power (red wire) to 5V through an external power source, not directly from the Pi's 5V rail if running two servos, as they can draw 500mA each under load
  • Servo ground (brown/black wire) to Pi ground

The camera module connects via the Pi's CSI ribbon cable port. Mount the camera to the pan-tilt bracket so it moves with both axes. Most kits include mounting hardware, but you may need M2 standoffs for the Camera Module 3 since its form factor is slightly different from the Module 2.

Enable I2C on the Pi:

Run sudo raspi-config, navigate to Interface Options, and enable I2C. Verify the connection with i2cdetect -y 1. You should see the PCA9685 at address 0x40.

Power considerations:

The official Raspberry Pi 27W USB-C power supply (5V/5.1A) provides enough headroom for the Pi 5 plus two small servos. If you add the AI HAT+ as well, an external 5V servo power supply is safer. Underpowered servos jitter and lose position accuracy, which makes the tracking loop fight itself.

Fastio features

Store your tracking footage where agents and humans can both search it

Fast.io gives your OpenClaw tracking agent 50 GB of free cloud storage with automatic indexing, semantic search, and shareable event links. No credit card required.

Setting Up OpenClaw with Camera and Servo Tools

OpenClaw on Raspberry Pi runs as a local agent that connects to cloud LLM APIs for reasoning. The Pi handles sensor input, camera capture, and servo control locally, while the LLM provides the decision-making layer. Install OpenClaw following the official Raspberry Pi deployment guide at docs.openclaw.ai/install/raspberry-pi.

The key architectural piece is OpenClaw's tool system. Tools are typed functions that the agent can call during its reasoning loop. For a pan-tilt tracker, you need three categories of tools: camera capture, object detection, and servo control.

Camera capture uses Python's picamera2 library on Pi 5 (or the legacy picamera on Pi 4). The tool captures a frame from the camera module and returns it for processing. OpenClaw's tool system lets you register custom Python functions that wrap these library calls, so the agent can request a frame whenever it needs one.

Object detection runs either on the CPU using OpenCV's DNN module with a MobileNet or YOLO model, or on the Hailo accelerator if you have the AI HAT+. The detection tool takes a captured frame and returns a list of detected objects with their bounding boxes, class labels, and confidence scores.

Servo control uses the adafruit-circuitpython-pca9685 library for PCA9685-based HATs, or gpiozero for direct PWM servo control. The tool accepts a pan angle and tilt angle (each ranging from -90 to +90 degrees) and moves the servos to that position.

The agent's reasoning loop ties these together. Each cycle, it captures a frame, runs detection, evaluates the results, decides which object (if any) to track, calculates the required pan and tilt adjustment, and sends the servo command. The LLM evaluation step is what separates this from a simple PID loop.

Building the Tracking Logic

Traditional pan-tilt trackers calculate the pixel offset between the detected object's centroid and the frame center, then feed that offset into a PID controller that outputs servo angle adjustments. This works mechanically but makes no qualitative decisions about tracking.

With OpenClaw as the decision layer, the tracking loop gains several capabilities that pure PID control cannot provide.

Object prioritization. When multiple objects appear in the frame, the agent can decide which one to track based on class label, size, position, or user-defined rules. A security camera might prioritize people over animals. A wildlife tracker might prioritize rare species over common ones. A delivery monitor might only care about vehicles stopping near the front door. These priority rules live in the agent's prompt, not in hard-coded detection filters.

Context-aware tracking. The agent maintains conversational context across frames. If it was tracking a person who walked behind a tree, it can hold position and wait rather than immediately swinging to track the next moving object. Traditional trackers lose the target and grab whatever else is visible.

Event classification. Not every detection is worth recording. A bird crossing the frame for half a second is noise. A person approaching the front door is an event. The agent can classify detections as events, near-events, or noise, and only trigger recording or cloud upload for the interesting ones.

Adaptive behavior. The agent can adjust its own parameters based on conditions. In a busy scene with many objects, it might widen its tracking tolerance to avoid constant servo movement. In an empty scene, it might increase sensitivity. These adjustments happen through the LLM's reasoning, not through manual PID tuning.

The servo control layer beneath the agent still uses proportional control for smooth movement. The agent decides the target position; a simple proportional loop handles the smooth transition from the current angle to the target angle. You do not need a full PID controller for most applications since the agent's decision cycle (running every 100 to 500 milliseconds depending on your LLM latency) naturally dampens oscillation by updating the target position gradually.

Storing and Reviewing Tracked Footage

A tracking camera generates two types of output: continuous position logs and event recordings. Both need to go somewhere durable and searchable, especially if the camera runs unattended.

Local storage on the Pi's SD card works for short-term buffering, but SD cards wear out under continuous write loads and offer no remote access. A USB SSD extends the local buffer, but you still have to physically retrieve the drive to review footage.

Cloud storage solves the access problem. The agent can upload event clips and detection logs to a workspace where they are immediately searchable and shareable. Fast.io provides a workspace layer that fits naturally into this pipeline. The free agent plan includes 50 GB of storage and 5,000 API credits per month with no credit card required, which is enough for a single camera uploading event clips.

When Intelligence Mode is enabled on a Fast.io workspace, uploaded files are automatically indexed for semantic search. This means you can search your tracking archive by description ("person carrying a package" or "cat near the garden") rather than scrubbing through hours of timestamped clips. The agent can also use the Fast.io MCP server to upload files, organize them into folders by date or event type, and create shareable links for specific events.

For multi-camera setups, each camera agent can write to a separate workspace folder. The workspace audit trail tracks every upload with timestamps, so you get a complete chain of custody for recorded events. If you need to hand the footage archive to someone else, workspace ownership transfer lets you move the entire collection to another account while keeping the original agent as an admin.

Alternatives for cloud storage include S3-compatible buckets (cheap but no built-in search), Google Drive (easy sharing but no agent-native API), or a self-hosted NAS (full control but no remote access without VPN). Fast.io's advantage for this use case is the combination of agent-accessible storage, automatic indexing, and built-in sharing, all of which reduce the amount of infrastructure you need to build yourself.

The tracking camera pattern extends well beyond security. The same hardware and agent architecture works for wildlife observation (track and record animals at a feeding station), sports analysis (follow a ball or player during practice), workshop monitoring (track tool usage and movement patterns), or any scenario where you want a camera that makes decisions about what it watches and records.

Frequently Asked Questions

How do I make a tracking camera with Raspberry Pi?

You need a Raspberry Pi (4 or 5), a camera module, a pan-tilt servo mechanism like the Waveshare 2-DOF Pan-Tilt HAT, and an object detection model. The camera captures frames, a detection model identifies objects and their positions, and servo commands adjust the camera angle to keep the target centered. Adding an agent framework like OpenClaw lets the system make intelligent decisions about which objects to track and when to record.

Can Raspberry Pi control servo motors for camera tracking?

Yes. The Pi controls servos either through direct GPIO PWM (using libraries like gpiozero) or through an I2C servo driver like the PCA9685 chip found on most pan-tilt HATs. The PCA9685 approach is preferred because it offloads PWM generation to dedicated hardware, produces more stable signals, and only uses two GPIO pins regardless of how many servos you connect.

What is the best pan-tilt kit for Raspberry Pi camera?

The Waveshare 2-DOF Pan-Tilt HAT is currently the most popular option. It includes two servo motors, an onboard PCA9685 PWM driver, I2C interface, and mounts directly on the Pi's 40-pin header for under $25. The Arducam Pan Tilt Platform and SunFounder Pan-Tilt Hat v3.0 are solid alternatives. The Pimoroni Pan-Tilt HAT, which was the long-standing favorite, has been discontinued.

Do I need the Hailo AI HAT for object tracking?

No. You can run lightweight detection models like MobileNet SSD or small YOLOv8 variants directly on the Pi's CPU at 5 to 10 frames per second. The Hailo AI HAT+ (available in 13 TOPS and 26 TOPS variants) pushes detection to 30fps at 1080p, which helps with faster-moving objects or scenes that need smoother servo response. For tracking people or vehicles at walking speed, CPU-only detection is sufficient.

How does OpenClaw improve tracking compared to a PID controller?

A PID controller follows whatever object the detection model returns first, with no understanding of what it is tracking or why. OpenClaw adds an LLM decision layer that can prioritize specific object types, maintain tracking context when a target is temporarily occluded, classify events by significance, and adjust tracking sensitivity based on scene conditions. The result is a camera that behaves more like a camera operator than a mechanical follower.

Related Resources

Fastio features

Store your tracking footage where agents and humans can both search it

Fast.io gives your OpenClaw tracking agent 50 GB of free cloud storage with automatic indexing, semantic search, and shareable event links. No credit card required.