This project demonstrates how to create a real-time voice agent using Pipecat framework in Python with FastAPI, integrated with Attendee's meeting bot API.
- Real-time voice-to-voice conversational AI using Pipecat
- Integration with Deepgram for speech-to-text
- Integration with OpenAI for language processing
- Integration with Attendee API for meeting bot functionality
- WebSocket-based communication between browser and server
- Configurable agent personality and voice
- Python 3.10 or higher
- UV for dependency management
- Pipecat framework
- API keys for:
- Deepgram (for STT/TTS)
- OpenAI (for LLM)
- Attendee (for meeting bot API)
- Ngrok or similar tunneling service for WebSocket connections
- Clone the repository
- Install dependencies using UV:
uv sync
- Copy
.env.example
to.env
and fill in your API keys:DEEPGRAM_API_KEY=your_deepgram_api_key OPENAI_API_KEY=your_openai_api_key ATTENDEE_API_KEY=your_attendee_api_key ATTENDEE_API_HOST=https://app.attendee.dev NGROK_URL=wss://your-ngrok-url.ngrok-free.app PORT=8000
- Run the application:
python app/main.py
- Start ngrok or your preferred tunneling service to expose port 8000
- Open
http://localhost:8000
in your browser - Configure the voice agent:
- Meeting URL: The URL of the meeting the bot should join
- WebSocket Tunnel URL: Your ngrok WebSocket URL
- Agent Prompt: Customize the AI assistant's personality
- Greeting Message: Set what the agent says when it joins
- Voice Model: Choose from available Deepgram voice models
- Click "Launch Voice Agent" to start the bot
app/main.py
: Main FastAPI application with WebSocket endpointstatic/index.html
: Frontend interfacepyproject.toml
: Project dependencies and metadata.env.example
: Environment variable template
- The browser connects to the FastAPI server via WebSocket
- User configures the voice agent through the web interface
- When "Launch Voice Agent" is clicked, the server calls Attendee API to join the meeting
- Audio is streamed in real-time between the browser and the Pipecat pipeline
- Pipecat processes the audio through:
- Deepgram STT (speech-to-text)
- OpenAI LLM (language processing)
- Deepgram TTS (text-to-speech)
- The processed audio is sent back to the browser and into the meeting
GET /
: Serve the web interfaceWebSocket /ws
: Handle real-time audio streaming and bot configuration
The application can be configured using environment variables:
DEEPGRAM_API_KEY
: Deepgram API key for speech processingOPENAI_API_KEY
: OpenAI API key for language processingATTENDEE_API_KEY
: Attendee API key for meeting bot functionalityATTENDEE_API_HOST
: Attendee API host URL (default: https://app.attendee.dev)NGROK_URL
: Ngrok URL for WebSocket connections (default: ws://localhost:8080)PORT
: Port to run the server on (default: 8080)