AI Voice Assistant Pro

Multimodal Intelligent Agent | Voice • Text • Vision

Overview

Flask app with LangGraph routing: fast-path for simple queries, GPT-4o / Claude / Ollama for complex tasks, vision for images, Redis caching, and sub–2.5s voice latency.

Problem

Organizations need voice-enabled AI but face complex pipeline integration, vendor lock-in, high latency exceeding five seconds, and deployment requiring weeks.

Solution

The Flask application accepts voice via browser microphone. ffmpeg converts WebM to WAV. SpeechRecognition provides transcription. LangGraph agent checks simple queries (time, weather, calculator) returning cached results without LLM calls. Complex queries route to OpenAI GPT-4o, Claude 3.5 Sonnet, or local Ollama. Images upload for GPT-4o vision analysis. gTTS generates voice responses. Redis caches weather (5 minutes) and time (30 seconds). Deployment requires docker-compose up in under 15 minutes.

Technologies

Flask 2.3
LangGraph 0.0.20
OpenAI API (GPT-4o, GPT-4o-mini)
Anthropic Claude 3.5 Sonnet
Ollama
SpeechRecognition 3.10
ffmpeg
gTTS
OpenCV 4.8
Redis 7.0
Tailwind CSS
Docker

Results

Achieves 1.2–2.5 second voice latency. 60% of requests bypass LLM via fast-path routing. Token costs reduced by 40%. Cached responses under 50ms. 99.9% reliability over 5,000+ conversations.

Discuss a similar project All projects