Inconsistent librosa versions  PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech


PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech

librosa is used through all  audio projects although only a few functions. requirements files refer to different versions. But not all syntax is coherent with the versions 'required`.

 The main change in librosa > 7 is that many of the functions require kwargs, only positional args allowed are typically the data.
 e.g. `librosa.core.resample(y: 'np.ndarray', *, orig_sr: 'float', target_sr: 'float',  .. etc`

1. PyTorch/SpeechSynthesis/ project requirements ask for
* `PyTorch/SpeechSynthesis/Tacotron2/requirements.txt` requires `librosa`
* `PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/src/trt/requirements.txt` `librosa==0.7.0`
* `PyTorch/SpeechSynthesis/HiFiGAN/requirements.txt` `librosa==0.9.0`
* `PyTorch/SpeechSynthesis/FastPitch/requirements.txt` `librosa==0.9.0`

For consistency they should all require the same version.   All but one function - listed below - can run on librosa 10

2. On the frameworks requiring the newer pytorch, some files use the old syntax.
* PyTorch/SpeechSynthesis/FastPitch/hifigan/data_function.py line 72 `librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)`
* PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client/speech_ai_demo/utils/jasper/speech_utils.py lines 386 & 389 `samples = librosa.core.resample(samples, sample_rate, target_sr)` `librosa.effects.trim(samples, trim_db)`

*CUDA-Optimized/FastSpeech/generate.py uses deprecated `librosa.output.write_wav(path, wav, hp.sr)` see https://github.com/librosa/librosa/issues/1062
* CUDA-Optimized/FastSpeech/tacotron2/audio_processing.py line 82 `win_sq = librosa_util.pad_center(win_sq, n_fft)`

Several of those functions will. It is simple enough to clean the code.

Environment 
*Driver Version: 535.129.03
*NVIDIA GeForce RTX 3080 
* github cloned over docker  image nvidia/cuda:12.1.0-devel-ubuntu22.04


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions