Skip to content

Simple, modular function for real-time voice2voice chat using local VAD-STT-LLM-TTS, swappable components.

License

Notifications You must be signed in to change notification settings

Katehuuh/FlexVoice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlexVoice

Flex on cloud-paid-APIs aproche with a flexible modular, easly upgradeable voice chat framework by swaps VAD, STT, LLM, and TTS in local pipeline for real-time voice assistants.

⭐ Features & Components

🔥 Fully local real-time voice chat via modular pipeline WebRTC>VAD>SST>LLM>TTS in Gradio interface:

📐 Structured Architecture in independent modules:

def process_pipeline(audio, conversation):
    # 1. Voice Activity Detection: Audio array -> `check_vad()` ->  Boolean (speech detected)
    # 2. Speech-to-Text: Audio array -> `load_stt_models` and `transcribe_with_whisper(audio_array)` ->  Transcribed text
    # 3. LLM: Text prompt -> `send_to_llm` -> Text response
    # 4. Text-to-Speech: Text -> `initialize_tts`, `text_to_speech(text)` -> (audio_array, sample_rate)
flowchart TD
    A(["`🎤 Start: Receive Audio Input (WebRTC)`"]) --> B
    B["`🔍 Voice Activity Detection (VAD): Silero VAD (check_vad)`"] --> C{"`❓ Speech Detected?`"}
    C -- Yes --> D["`🗣️ Speech-to-Text (STT): Whisper (transcribe_with_whisper)`"]
    D --> E["`🤖 Language Model (LLM): OpenAI API (send_to_llm)`"]
    E --> F["`🔊 Text-to-Speech (TTS): Kokoro (text_to_speech)`"]
    F --> G(["`🎧 End: Output Audio Response (Gradio Interface)`"])
    C -- No --> H(["`🚫 End: No Action`"])

    class A,G,H output
    class C decision
    class B,D,E,F process
Loading
  • 🎤 Voice Activity Detection: (Silero VAD)
  • 🗣️ Speech-to-Text insanely-fast-whisper
  • 🤖 Language Model (OpenAI API format, local server)
  • 🔊 Text-to-Speech (Kokoro)
  • 🤝 Contributing: Easy to modify - share your implementations!

🚀 Setup & Quick Start

# 📜 Requirements: Python>=3.8+, cu118+ (CUDA Toolkit for Whisper) and server openai-API (default: ooba webui)
# 1. 🔧 Clone & Install dependencies:
git clone https://github.com/Katehuuh/FlexVoice.git && cd FlexVoice
python -m venv venv && venv\Scripts\activate
pip install -r requirements.txt
# 2. Run! Opens http://localhost:7861
python app.py

About

Simple, modular function for real-time voice2voice chat using local VAD-STT-LLM-TTS, swappable components.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages