Pipecat Voice Agent with Attendee

This project demonstrates how to create a real-time voice agent using Pipecat framework in Python with FastAPI, integrated with Attendee's meeting bot API.

Features

Real-time voice-to-voice conversational AI using Pipecat
Integration with Deepgram for speech-to-text
Integration with OpenAI for language processing
Integration with Attendee API for meeting bot functionality
WebSocket-based communication between browser and server
Configurable agent personality and voice

Prerequisites

Python 3.10 or higher
UV for dependency management
Pipecat framework
API keys for:
- Deepgram (for STT/TTS)
- OpenAI (for LLM)
- Attendee (for meeting bot API)
Ngrok or similar tunneling service for WebSocket connections

Setup Instructions

Clone the repository
Install dependencies using UV:
```
uv sync
```

Copy .env.example to .env and fill in your API keys:

DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
ATTENDEE_API_KEY=your_attendee_api_key
ATTENDEE_API_HOST=https://app.attendee.dev
NGROK_URL=wss://your-ngrok-url.ngrok-free.app
PORT=8000

Run the application:
```
python app/main.py
```

Usage

Start ngrok or your preferred tunneling service to expose port 8000
Open http://localhost:8000 in your browser
Configure the voice agent:
- Meeting URL: The URL of the meeting the bot should join
- WebSocket Tunnel URL: Your ngrok WebSocket URL
- Agent Prompt: Customize the AI assistant's personality
- Greeting Message: Set what the agent says when it joins
- Voice Model: Choose from available Deepgram voice models
Click "Launch Voice Agent" to start the bot

Project Structure

app/main.py: Main FastAPI application with WebSocket endpoint
static/index.html: Frontend interface
pyproject.toml: Project dependencies and metadata
.env.example: Environment variable template

How It Works

The browser connects to the FastAPI server via WebSocket
User configures the voice agent through the web interface
When "Launch Voice Agent" is clicked, the server calls Attendee API to join the meeting
Audio is streamed in real-time between the browser and the Pipecat pipeline
Pipecat processes the audio through:
- Deepgram STT (speech-to-text)
- OpenAI LLM (language processing)
- Deepgram TTS (text-to-speech)
The processed audio is sent back to the browser and into the meeting

API Endpoints

GET /: Serve the web interface
WebSocket /ws: Handle real-time audio streaming and bot configuration

Configuration

The application can be configured using environment variables:

DEEPGRAM_API_KEY: Deepgram API key for speech processing
OPENAI_API_KEY: OpenAI API key for language processing
ATTENDEE_API_KEY: Attendee API key for meeting bot functionality
ATTENDEE_API_HOST: Attendee API host URL (default: https://app.attendee.dev)
NGROK_URL: Ngrok URL for WebSocket connections (default: ws://localhost:8080)
PORT: Port to run the server on (default: 8080)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
app		app
static		static
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pipecat Voice Agent with Attendee

Features

Prerequisites

Setup Instructions

Usage

Project Structure

How It Works

API Endpoints

Configuration

About

Uh oh!

Releases

Packages

Languages

attendee-labs/attendee-pipecat

Folders and files

Latest commit

History

Repository files navigation

Pipecat Voice Agent with Attendee

Features

Prerequisites

Setup Instructions

Usage

Project Structure

How It Works

API Endpoints

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages