MoonCast: High-Quality Zero-Shot Podcast Generation

Overview

Demo page: demo

2025/03/26 UPDATE: We also host a HuggingFace space for testing audio generation.

We open-source this system to advance the field of human-like speech synthesis. Our goal is to create more natural and expressive synthetic voices that bridge the gap between machines and humans. We hope this project will inspire researchers and developers to explore new possibilities in voice technology. We warmly welcome contributions from anyone interested in this project. Whether through code, documentation, feedback, or sharing your insights, every input helps make this project better.

Environment Setup

Create conda environment.

conda create -n mooncast -y python=3.10
conda activate mooncast
pip install -r requirements.txt 
pip install flash-attn --no-build-isolation
pip install huggingface_hub
pip install gradio==5.22.0

Download the pretrained weights.

python download_pretrain.py

Example Usage

Script Generation

For podcast script generation, we utilize specific LLM prompts defined in zh_llmprompt_script_gen.py (Chinese) and en_llmprompt_script_gen.py (English). We have selected the Gemini 2.0 Pro Experimental 02-05 model for this task, favoring its ability to produce conversational language, design natural dialogue, and offer broad topic coverage. Our process involves two stages: first, we generate a concise summary by providing the input knowledge source as an attachment along with the INPUT2BRIEF prompt. Subsequently, this summary, paired with the BRIEF2SCRIPT prompt, is used to generate the final podcast script in JSON format.

Speech Generation

The audio prompts used in this project are sourced from publicly available podcast segments and are intended solely for demonstration purposes. Redistribution of these audio files, whether in their original form or as generated audio, is strictly prohibited. If you have any concerns or questions regarding the use of these audio files, please contact us at [email protected]

CUDA_VISIBLE_DEVICIES=0 python inference.py

2025/03/26 UPDATE: We add a Gradio-based user interface for audio generation. Deploy it locally using:

CUDA_VISIBLE_DEVICIES=0 python app.py

Disclaimer

This project is intended for research purposes only. We strongly encourage users to use this project and its generated audio responsibly. We are not responsible for any misuse or abuse of this project. By using this project, you agree to comply with all applicable laws and ethical guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
fig		fig
modules		modules
test		test
.gitignore		.gitignore
LICENSE		LICENSE
app.py		app.py
download_pretrain.py		download_pretrain.py
en_llmprompt_script_gen.py		en_llmprompt_script_gen.py
en_prompt0.wav		en_prompt0.wav
en_prompt1.wav		en_prompt1.wav
inference.py		inference.py
readme.md		readme.md
requirements.txt		requirements.txt
zh_llmprompt_script_gen.py		zh_llmprompt_script_gen.py
zh_prompt0.wav		zh_prompt0.wav
zh_prompt1.wav		zh_prompt1.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MoonCast: High-Quality Zero-Shot Podcast Generation

Overview

Environment Setup

Example Usage

Script Generation

Speech Generation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jzq2000/MoonCast

Folders and files

Latest commit

History

Repository files navigation

MoonCast: High-Quality Zero-Shot Podcast Generation

Overview

Environment Setup

Example Usage

Script Generation

Speech Generation

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages