A real-time audio recording, transcription, and interpretation tool for meetings and live conversations. This project is designed for scenarios where speed and accuracy are critical, such as live interpretation or meeting environments.
- Real-time audio recording and transcription with continuous listening
- Speaker identification: print out the words received and indicate who is speaking
- Option to use either advanced speech recognition APIs (OpenAI Whisper or Google Speech Recognition) or local models for transcription
- Highlight keywords and uncommon words in the transcript
- Translate highlighted/uncommon words for better understanding
- Designed for live interpretation, meeting assistance, and second language conversations
- Python 3.8+
- openai-whisper
- Other dependencies listed in
pyproject.toml
- Open Terminal and navigate to the directory containing
pyproject.toml:cd /Users/chen/Library/CloudStorage/Dropbox/Code/py/ai/interpreter - Install in development mode:
pip install -e .
- To run the real-time interpreter:
python interpreter/main.py
- To compare different speech recognition models:
python interpreter/model_comparison.py
- Implement real-time, continuous audio recording and streaming
- Integrate Whisper API/local model, and evaluate other advanced models
- Add speaker identification and display who is talking
- Highlight and translate keywords/uncommon words in real time
- Build user interface (optional)
- Add Whisper model finetuning (future)
MIT