Skip to content
Discussion options

You must be logged in to vote

I modified whisper.load_audio() to support bytes, now I can transcribe audio using a file's bytes.

def load_audio(file_bytes: bytes, sr: int = 16_000) -> np.ndarray:
    """
    Converts audio file's bytes to mono waveform, resampling as necessary
    Parameters
    ----------
    file: bytes
        The bytes of the audio file
    sr: int
        The sample rate to resample the audio if necessary
    Returns
    -------
    A NumPy array containing the audio waveform, in float32 dtype.
    """
    try:
        # This launches a subprocess to decode audio while down-mixing and resampling as necessary.
        # Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
        out

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@MagzhanUnited
Comment options

@abhirup84
Comment options

Answer selected by CarlosGTrejo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants