NetworkMonitorKokoro is a Flask-based service that provides advanced text-to-speech (T2S) and speech-to-text (S2T) functionalities using state-of-the-art machine learning models:
- Text-to-Speech (T2S): Converts text input into high-quality synthesized speech using the Kokoro model.
- Speech-to-Text (S2T): Transcribes audio files into text using OpenAI's Whisper model.
This repository leverages ONNX for efficient inference and Hugging Face's model hub for seamless model downloads.
You can see the script in action with the Quantum Network Monitor Assistant at https://freenetworkmonitor.click.
-
T2S (Text-to-Speech)
- High-quality voice synthesis using the Kokoro ONNX model.
- Configurable voice styles via preloaded voicepacks.
-
S2T (Speech-to-Text)
- Accurate audio transcription with OpenAI Whisper.
- Handles a wide range of audio inputs.
-
Automatic Model Management
- Models are automatically downloaded from the Hugging Face Hub if not present locally.
-
Flask API Endpoints
/generate_audio
: Convert text into speech./transcribe_audio
: Transcribe audio into text.
Ensure you have the following installed:
- Python 3.8+ (Check with
python3 --version
orpython --version
) - Pip (Check with
pip --version
) - A CUDA-enabled GPU (optional, for faster inference)
- System Dependencies (Debian/Ubuntu):
sudo apt-get install libsndfile1 espeak-ng
- System Dependencies (Windows) choco install libsndfile espeak-ng -y
- System Dependencies (Mac) brew install libsndfile espeak-ng
-
Clone the repository:
git clone https://github.com/yourusername/NetworkMonitorKokoro.git cd NetworkMonitorKokoro
-
Create and activate a virtual environment:
- On Linux/macOS:
python3 -m venv venv source venv/bin/activate
- On Windows:
python3 -m venv venv venv\Scripts\activate
Once activated, you should see
(venv)
at the start of your command prompt, indicating the virtual environment is active. - On Linux/macOS:
-
Install the required dependencies:
- Run the installation script (cross-platform):
python3 install_dependencies.py
This script detects your operating system and installs the dependencies accordingly for Linux, Windows, and macOS.
- Run the installation script (cross-platform):
-
Set up the models:
- The Kokoro T2S model and OpenAI Whisper S2T model will be downloaded automatically during runtime.
-
Start the Flask server:
python3 app.py
-
Deactivate the virtual environment (optional):
deactivate
To run NetworkMonitorKokoro as a systemd service on Linux, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/NetworkMonitorKokoro.git cd NetworkMonitorKokoro
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate
-
Install dependencies:
python3 install_dependencies.py
-
Create a systemd service file:
sudo nano /etc/systemd/system/networkmonitor-kokoro.service
Add the following content:
[Unit] Description=NetworkMonitorKokoro Service After=network.target [Service] User=yourusername WorkingDirectory=/path/to/NetworkMonitorKokoro ExecStart=/path/to/NetworkMonitorKokoro/venv/bin/python3 /path/to/NetworkMonitorKokoro/app.py Restart=always Environment=PYTHONUNBUFFERED=1 [Install] WantedBy=multi-user.target
Replace
/path/to/NetworkMonitorKokoro
with the full path to the directory where the repository was cloned and the virtual environment was created. Replaceyourusername
with your Linux username. -
Set proper permissions:
sudo chmod 644 /etc/systemd/system/networkmonitor-kokoro.service
-
Reload systemd:
sudo systemctl daemon-reload
-
Start the service:
sudo systemctl start networkmonitor-kokoro
-
Enable the service to start on boot:
sudo systemctl enable networkmonitor-kokoro
-
Check the service status:
sudo systemctl status networkmonitor-kokoro
- Endpoint:
/generate_audio
- Method:
POST
- Request Body:
{ "text": "Your text here", "output_dir": "/absolute/path/to/save/file/to/" }
- Response:
{ "status": "success", "output_path": "/absolute/path/to/save/file/to/<hash>.wav" }
- Endpoint:
/transcribe_audio
- Method:
POST
- Request Body:
- A form-data request with an audio file.
- Response:
{ "status": "success", "transcription": "Your transcription here" }
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text": "Hello, world!","output_dir":"/tmp"}' \
http://127.0.0.1:5000/generate_audio
curl -X POST \
-F "file=@sample_audio.wav" \
http://127.0.0.1:5000/transcribe_audio
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature/bugfix.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the MIT License. See the LICENSE file for details.
- Hugging Face for providing pre-trained models.
- OpenAI for the Whisper model.
- ONNX for efficient inference.
For questions or support, please open an issue or contact [email protected].