Description
PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech
librosa is used through all audio projects although only a few functions. requirements files refer to different versions. But not all syntax is coherent with the versions 'required`.
The main change in librosa > 7 is that many of the functions require kwargs, only positional args allowed are typically the data.
e.g. librosa.core.resample(y: 'np.ndarray', *, orig_sr: 'float', target_sr: 'float', .. etc
- PyTorch/SpeechSynthesis/ project requirements ask for
PyTorch/SpeechSynthesis/Tacotron2/requirements.txt
requireslibrosa
PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/src/trt/requirements.txt
librosa==0.7.0
PyTorch/SpeechSynthesis/HiFiGAN/requirements.txt
librosa==0.9.0
PyTorch/SpeechSynthesis/FastPitch/requirements.txt
librosa==0.9.0
For consistency they should all require the same version. All but one function - listed below - can run on librosa 10
- On the frameworks requiring the newer pytorch, some files use the old syntax.
- PyTorch/SpeechSynthesis/FastPitch/hifigan/data_function.py line 72
librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
- PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client/speech_ai_demo/utils/jasper/speech_utils.py lines 386 & 389
samples = librosa.core.resample(samples, sample_rate, target_sr)
librosa.effects.trim(samples, trim_db)
*CUDA-Optimized/FastSpeech/generate.py uses deprecated librosa.output.write_wav(path, wav, hp.sr)
see librosa/librosa#1062
- CUDA-Optimized/FastSpeech/tacotron2/audio_processing.py line 82
win_sq = librosa_util.pad_center(win_sq, n_fft)
Several of those functions will. It is simple enough to clean the code.
Environment
*Driver Version: 535.129.03
*NVIDIA GeForce RTX 3080
- github cloned over docker image nvidia/cuda:12.1.0-devel-ubuntu22.04