AI & Agents

How to Build a Smart Doorbell with OpenClaw on Raspberry Pi

A smart doorbell agent running OpenClaw on a Raspberry Pi pairs a camera module with a microphone and speaker to detect visitors, stream live video, enable two-way conversation, and push notifications to your phone. This guide covers the hardware list, wiring, OpenClaw agent setup, notification delivery, and cloud-synced event logs.

Fast.io Editorial Team 12 min read
AI agent managing smart home notifications and event data through a cloud workspace

Why Commercial Doorbells Cost More Than They Should

Ring, Google Nest, and similar video doorbells work well out of the box, but they come with ongoing costs that add up. Ring's Basic plan starts at $49.99 per year for a single device, while the Plus plan runs published pricing and the Pro plan reaches $199.99 annually. Google Nest charges similar rates for Nest Aware. Over five years, subscription fees alone can exceed the original hardware cost.

Beyond price, cloud-dependent doorbells introduce privacy concerns. Every clip gets uploaded to a third-party server. You are trusting the vendor with continuous footage of your front door, including when you leave, when you return, and who visits. Several high-profile incidents have shown that these recordings are not always as private as the marketing suggests.

A Raspberry Pi doorbell eliminates the subscription entirely. The hardware runs locally, footage stays on your network, and you control what gets stored and for how long. Plenty of Pi doorbell tutorials exist on platforms like Hackster.io and Hackaday, but most stop at streaming video and adding a button. They lack the intelligence layer that makes a doorbell genuinely smart: knowing the difference between a delivery driver, a neighbor, and a stray cat.

That intelligence layer is where OpenClaw fits. Instead of forwarding every motion event to your phone, an OpenClaw agent classifies what it sees, decides whether the event warrants a notification, and delivers context along with the alert. The result is fewer interruptions and more useful information when something actually matters.

What to check before scaling openclaw raspberry pi smart doorbell video intercom agent

OpenClaw runs on a Raspberry Pi 4 (4 GB minimum) or Raspberry Pi 5 (8 GB recommended). The Pi 5's Cortex-A76 cores handle local inference faster, but either board works since OpenClaw can offload reasoning to cloud-hosted LLMs through Claude, GPT-4, or Gemini. Power draw is roughly 5W, which translates to about published pricing in electricity.

Core components:

  • Raspberry Pi 4 (4 GB) or Pi 5 (8 GB), around $55 to $80
  • MicroSD card (32 GB or larger) or M.2 HAT+ with SSD for better write endurance
  • Raspberry Pi Camera Module 3 Wide, the 120-degree field of view covers a full doorstep
  • USB condenser microphone for visitor audio capture, around $9
  • Small speaker (2W, 28mm) with an LM386 audio amplifier module for the intercom output
  • Momentary push button for the doorbell trigger
  • 5V 3A USB-C power supply
  • Weatherproof enclosure (IP65 rated or better for outdoor mounting)

Optional but useful:

  • Raspberry Pi NoIR Camera Module for infrared night vision (pair with an IR LED array for clear footage in darkness)
  • IQaudio Codec Zero HAT, which provides I2S microphone input and speaker output on a single board, avoiding the need for separate USB audio devices
  • GPIO-connected relay module to trigger an existing wired doorbell chime inside the house

Wiring overview:

The camera connects to the Pi's CSI ribbon port. The push button wires between GPIO 17 and ground with an internal pull-up resistor enabled in software. The USB microphone plugs into any available USB port. For the speaker, connect the LM386 amplifier's input to the Pi's 3.5mm audio jack (or I2S output if using the Codec Zero HAT), and wire the amplifier output to the speaker. If you are using a relay for an indoor chime, connect the relay signal pin to GPIO 27.

Enclosure considerations:

An outdoor doorbell needs weather protection. A 3D-printed enclosure with a clear acrylic window for the camera lens works well, but make sure to include ventilation holes positioned to avoid direct rain entry. Silicone sealant around cable entry points prevents moisture ingress. Mount the unit at chest height, roughly 120 cm from the ground, so the camera captures faces rather than foreheads.

Neural network processing sensor data from connected hardware

Installing OpenClaw and Configuring the Doorbell Agent

Start with a fresh Raspberry Pi OS (64-bit) installation. Enable the camera interface and I2C through raspi-config, then update the system packages.

OpenClaw installs through a single command, as documented by the Raspberry Pi Foundation:

curl -fsSL https://openclaw.ai/install.sh | bash

This handles Node.js dependencies and sets up the agent runtime. After installation, run openclaw onboard to configure your preferred LLM provider. OpenClaw supports OpenAI, Anthropic (Claude), Google (Gemini), DeepSeek, and local models through Ollama. For a doorbell agent that needs fast responses, a cloud-hosted model with low latency is the better choice. Local models on a Pi 4 introduce noticeable delay during classification.

Defining the doorbell skill:

OpenClaw's behavior is defined through skills, which are directories containing a SKILL.md file with instructions the agent follows. Create a doorbell-agent skill directory and define the agent's responsibilities:

  1. Monitor the GPIO button for press events and trigger a notification when pressed
  2. Capture a camera frame on motion detection or button press
  3. Classify the captured frame (person, vehicle, animal, package, or unknown)
  4. If a person is detected, send a notification with the captured image
  5. Enable two-way audio when the homeowner responds to the notification
  6. Log every event with timestamp, classification, and image reference

The skill file tells the agent how to reason about inputs. For example, you can instruct it to suppress notifications for repeated motion events within a 30-second window (to avoid alert storms from a swaying tree) while always alerting immediately on button presses regardless of recent activity.

Camera and motion detection:

Use libcamera-still for single frame captures and libcamera-vid for video streaming. The agent can invoke these through shell commands. For motion detection without a dedicated PIR sensor, frame differencing works: capture a reference frame every few seconds and compare pixel changes against a threshold. Libraries like OpenCV provide this out of the box, and the Pi 5 handles the processing comfortably.

Audio setup:

Test the microphone with arecord -l to confirm the Pi detects it, then test playback with aplay. For two-way audio during an intercom session, the agent pipes incoming audio from a WebRTC or SIP stream to the speaker while capturing microphone input and streaming it back to the homeowner's phone. UV4L's WebRTC extension, used in several Pi intercom projects on Hackster.io, handles the bidirectional audio and video transport layer.

Fastio features

Store Your Doorbell Event History in One Place

Fast.io gives your OpenClaw agent 50 GB of free cloud storage for event images, classification logs, and video clips. No credit card, no subscription fees, and built-in AI search across your entire archive. Built for openclaw raspberry smart doorbell video intercom agent workflows.

Visitor Detection with AI Classification

A basic motion-triggered doorbell sends a notification every time something moves in the camera's field of view. Wind, passing cars, shadows, and animals all trigger alerts. The average homeowner with a motion-activated doorbell receives 15 to 25 false alerts per day, and most people start ignoring notifications within a week.

OpenClaw solves this by adding a classification step between detection and notification. When the camera captures a frame on motion or button press, the agent can process it through one of two approaches.

Lightweight local classification:

TensorFlow Lite runs on the Pi with ARM64 support. A MobileNet SSD model trained on the COCO dataset can classify people, vehicles, animals, and common objects at roughly 5 frames per second on a Pi 5. This keeps everything local and adds no API cost. The tradeoff is lower accuracy in poor lighting and limited ability to distinguish between, say, a delivery driver and a family member.

LLM-assisted classification:

OpenClaw can send a captured frame to a vision-capable LLM (GPT-4o, Claude, or Gemini) for richer analysis. The LLM receives the image along with context from the agent's skill instructions: "Describe who or what is at the door. Is this a person, a delivery, a vehicle, or an animal? If a person, describe what they are carrying or doing." The response comes back in 1 to 3 seconds depending on the provider.

The LLM approach gives you natural-language descriptions in your notifications. Instead of "Person detected," you get "Adult carrying a cardboard package, standing at the door." That description tells you whether to interrupt what you are doing or let it wait.

Combining both approaches:

The most practical setup uses TensorFlow Lite as a fast first pass. If the local model detects a person with high confidence, the agent sends the notification immediately. If confidence is low or the classification is ambiguous, it escalates to the LLM for a second opinion before alerting. This keeps API costs minimal while maintaining accuracy where it matters.

Time-based filtering:

The agent can apply different sensitivity rules based on time of day. During business hours when deliveries are expected, classify and notify on all person detections. Late at night, only alert on button presses or sustained presence (someone standing at the door for more than 10 seconds). These rules live in the skill file and are easy to adjust without touching code.

AI system summarizing and classifying incoming event data

Notifications and Two-Way Intercom

A doorbell that detects visitors but cannot reach you is not useful. The notification pipeline is what turns sensor data into action.

Telegram notifications:

Telegram's Bot API is the simplest path to instant notifications on your phone. Create a bot through BotFather, grab the token, and configure OpenClaw to send messages to your chat ID. The agent can send text, images, and even short video clips. Notification latency from detection to phone buzz is typically under 2 seconds, which is fast enough to catch a delivery driver before they leave.

Webhook notifications:

For tighter integration with home automation, OpenClaw can fire webhooks to Home Assistant, Node-RED, or any HTTP endpoint. A webhook payload containing the classification result, timestamp, and image URL lets you build custom automation. Ring the indoor chime only for people. Flash a smart bulb for packages. Log everything to a dashboard.

Two-way intercom flow:

When you receive a notification, you want to be able to talk to whoever is at the door. The intercom works through a WebRTC session:

  1. The agent detects a visitor and sends a notification with a link
  2. You tap the link on your phone, which opens a browser-based WebRTC client
  3. The WebRTC session connects to UV4L running on the Pi
  4. You see live video from the doorbell camera and hear audio from the microphone
  5. When you speak, your audio plays through the Pi's speaker at the door

UV4L handles the media transport, NAT traversal, and codec negotiation. The agent's role is orchestrating when to start and stop the intercom session and logging the interaction.

Offline fallback:

If your internet connection drops, the agent still functions locally. Button presses trigger the indoor chime through the GPIO relay. Motion events get logged to local storage. When connectivity returns, the agent syncs queued notifications. The doorbell never becomes a dead button because your ISP had an outage.

Event Logging and Cloud-Synced Storage

Every doorbell event generates data: a timestamp, a classification result, a camera frame, and sometimes audio. Storing this locally on the Pi's SD card works for short-term review, but SD cards have limited write endurance and a full card means lost events.

Local storage strategy:

Write event images and metadata to an SSD connected through USB 3.0 or the Pi 5's M.2 slot. An SSD handles continuous writes far better than an SD card and provides enough space for months of event history. Structure the data as daily directories with JSON metadata files alongside captured images.

Cloud backup with Fast.io:

For long-term archival and remote access, the agent can sync event data to a Fast.io workspace. Fast.io's free agent tier provides 50 GB of storage, 5,000 credits per month, and 5 workspaces with no credit card required. The agent uploads event images and metadata through the Fast.io API or MCP server, keeping a structured archive that you can browse from any device.

Alternatives like syncing to Google Drive, S3, or a NAS work too. The advantage of Fast.io for this use case is Intelligence Mode: once enabled on a workspace, uploaded files are automatically indexed for semantic search. You can ask questions like "show me all events where someone was carrying a package" and get relevant results without manually tagging every image. The MCP server also means other AI agents in your workflow can query the doorbell's event history programmatically.

Retention policies:

Configure the agent to delete local files older than 30 days while keeping cloud copies indefinitely (or whatever retention period you prefer). This prevents the local drive from filling up while maintaining a complete history. The agent can handle this cleanup as a scheduled task, running once daily during low-activity hours.

Audit trail:

Every notification sent, every intercom session opened, and every classification made gets logged. If you need to review who visited while you were away, the event log provides a chronological record with images and AI-generated descriptions. Fast.io's audit log features add another layer, tracking file uploads, access events, and workspace activity for the complete picture.

Audit log tracking file uploads and access events in a cloud workspace

Frequently Asked Questions

Can you build a smart doorbell with Raspberry Pi?

Yes. A Raspberry Pi 4 or Pi 5 paired with a camera module, USB microphone, small speaker, and push button gives you all the hardware for a video doorbell with two-way intercom. Adding OpenClaw as an AI agent layer provides visitor classification and intelligent notifications that go beyond what a basic camera stream offers.

How does OpenClaw handle video streaming on Raspberry Pi?

OpenClaw does not handle video streaming directly. It orchestrates the tools that do. The agent invokes libcamera for still captures and video recording, and delegates real-time streaming to UV4L's WebRTC extension for two-way audio and video. OpenClaw's role is deciding when to capture, what to classify, and when to notify.

What hardware do you need for a Pi video intercom?

The essentials are a Raspberry Pi 4 (4 GB) or Pi 5 (8 GB), a Camera Module 3 Wide, a USB microphone, a small speaker with an LM386 amplifier, a push button, and a 5V 3A power supply. For outdoor use, add a weatherproof enclosure. The total cost is around $80 to $120 depending on components, with no ongoing subscription.

How much does a Raspberry Pi doorbell cost compared to Ring?

The Pi hardware runs $80 to $120 as a one-time cost with zero subscription fees. Ring devices cost $100 to $250 for hardware plus $49.99 to $199.99 per year in subscription fees. Over three years, a Ring setup can cost $250 to $850 total while the Pi doorbell stays at its initial hardware cost plus roughly published pricing in electricity.

Can OpenClaw send doorbell notifications to my phone?

Yes. OpenClaw can send notifications through Telegram bots, webhooks to Home Assistant or Node-RED, or any HTTP endpoint. Telegram is the simplest option, delivering image notifications with AI-generated descriptions in under 2 seconds from the moment the visitor is detected.

Related Resources

Fastio features

Store Your Doorbell Event History in One Place

Fast.io gives your OpenClaw agent 50 GB of free cloud storage for event images, classification logs, and video clips. No credit card, no subscription fees, and built-in AI search across your entire archive. Built for openclaw raspberry smart doorbell video intercom agent workflows.