Private Voice Assistant

Diem Home AI

A fully local, privacy-first voice assistant powered by Claude AI, running on custom ESP32 hardware with Home Assistant as the orchestration layer.

~3.7s
Voice Round-Trip
18
AI Tools
0
Cloud Dependencies
~$3
Monthly Cost

01 The Pipeline

From wake word to spoken response — every step runs on local hardware except the LLM inference.

🎙️
"Hey Jarvis"
Wake word detected locally via openWakeWord on home server. Threshold 0.5 with trigger-level 2 to minimize false positives.
Wyoming Protocol
📡
Audio Capture & Stream
ESP32 captures 16kHz/32-bit audio via dual MEMS mics (ES7210 ADC codec) and streams to Home Assistant over WiFi.
ESPHome voice_assistant
🗣️
Speech-to-Text
faster-whisper (medium model) on MacBook Pro M5. German language, CPU int8 quantization. ~2.4s latency with VAD filtering.
faster-whisper :10300
🧠
AI Reasoning
Claude Haiku 4.5 with agentic tool-use loop. The model decides which tools to invoke — no intent classification, no NLU pipeline.
Anthropic API + 18 tools
🔊
Text-to-Speech
Piper TTS with de_DE-thorsten-high voice. Natural German speech, ~0.5s generation time.
Piper :10200
🔈
Audio Playback
48kHz audio streamed back to ESP32, output via ES8311 DAC to integrated 8Ω 2W speaker with 2x volume boost.
I2S + PCM5101

02 The Hardware

Waveshare ESP32-S3-Touch-LCD-1.85C-BOX — a compact round-display device purpose-built for voice interaction.

Processor
ESP32-S3 Dual-Core
240 MHz
💾
Memory
8MB PSRAM
16MB Flash
🎤
Microphone
Dual MEMS ICS-43434
via ES7210 ADC
🔊
Speaker
8Ω 2W + ES8311 DAC
+ PA Amplifier
🖥️
Display
360×360 Round QSPI
Capacitive Touch
🔋
Power
3.7V 1000mAh
Li-Po Battery

The dual-codec architecture (ES7210 ADC for mic, ES8311 DAC for speaker) enables simultaneous bidirectional audio without conflicts. Two independent I2S buses ensure capture and playback never block each other.

03 The Brain

A lean Node.js/TypeScript microservice (~500 LOC) exposing an OpenAI-compatible API. Under the hood, Claude Haiku 4.5 runs in an agentic tool-use loop — it decides what to do based on the user's request.

Jarvis Voice Agent (Node.js, Docker, port 3002) ├── Express.js → /v1/chat/completions (OpenAI-compatible) ├── Claude Haiku 4.5 → Agentic tool-use loop │ ├── Custom tools (17) → Smart Home, Energy, Music, HA │ └── Server tool (1) → Anthropic Web Search ├── System Prompt → German personality, TTS-optimized └── Logger → Color-coded console output with cost tracking Home Assistant (thin proxy) ├── Extended OpenAI Conversation → routes text to Jarvis ├── Wyoming STT → faster-whisper on Mac ├── Wyoming TTS → Piper on Mac └── ESPHome → manages satellite connection

Tool Inventory

Smart Home — Skyvu API (ioBroker)
🏠 get_rooms
List all rooms with devices and current states
💡 send_command
Control any device — lights, AC, thermostats, plugs
📊 get_devices
All devices grouped by type with status
🔍 get_device_states
Current state and attributes of any device
Energy Management
☀️ get_energy
Real-time PV production, battery, grid, consumption, autonomy %
📈 get_energy_history
Time-series energy data — day, week, month, year
🔌 get_evcc
EV charger status — mode, power, battery SoC, range
🚗 set_evcc_mode
Switch charge mode: PV-only, min+solar, immediate, off
Home Assistant
🏡 ha_get_entities
List HA entities, optionally filtered by domain
⚙️ ha_call_service
Call any HA service — lights, switches, media players
🌤️ ha_get_weather
Current weather + 5-day forecast
Music — Sonos + Spotify
🎵 music_browse_spotify
Browse Spotify library — playlists, artists, albums
▶️ music_play_item
Play content discovered via browse — never guessed URIs
music_play_favorite
Play Sonos favorites by fuzzy name match
🔎 music_browse
Browse Sonos media library — favorites, radio, queue
General Knowledge
🌐 web_search
Anthropic native server tool — no external API key required

04 Infrastructure

Component Location Stack
ESP32 Voice Satellite Living room ESPHome Wyoming
Home Assistant Home server Docker Thin proxy
openWakeWord Home server Docker "hey_jarvis"
Jarvis Voice Agent Home server :3002 Docker Node.js 22
Skyvu API (ioBroker) Home server :3001 Node.js REST
faster-whisper STT MacBook Pro M5 :10300 Python Wyoming
Piper TTS MacBook Pro M5 :10200 Python Wyoming

05 Design Decisions

06 Running Cost

$2–5/mo
Claude API only (Haiku 4.5 at $1/MTok in, $5/MTok out)
Everything else runs on existing hardware.

07 Roadmap