Transcription using tensor fails using model.transcribe()
, can't use whisper.load_audio()
as an alternative
#1145
-
I am trying to use whisper in the backend of a website project. I need to be able to transcribe audio that is uploaded without saving/reading to disk. I am loading the bytes into a
Discussion #930 mentions a similar issue, but that answer does not work for my case since
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I modified def load_audio(file_bytes: bytes, sr: int = 16_000) -> np.ndarray:
"""
Converts audio file's bytes to mono waveform, resampling as necessary
Parameters
----------
file: bytes
The bytes of the audio file
sr: int
The sample rate to resample the audio if necessary
Returns
-------
A NumPy array containing the audio waveform, in float32 dtype.
"""
try:
# This launches a subprocess to decode audio while down-mixing and resampling as necessary.
# Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
out, _ = (
ffmpeg.input('pipe:', threads=0)
.output("pipe:", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)
.run_async(pipe_stdin=True, pipe_stdout=True)
).communicate(input=file_bytes)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0 |
Beta Was this translation helpful? Give feedback.
I modified
whisper.load_audio()
to support bytes, now I can transcribe audio using a file's bytes.