Replies: 2 comments 1 reply
-
This code fragment below from a different discussion uses a lower-level API to select the most probable language (based on If the first 30s of your audio is noisy or not representative of your two languages then there is a different problem. See the description of this pull request |
Beta Was this translation helpful? Give feedback.
-
Being able to provide a list of languages to My use case is bilingual:
Similarly to @junkoran here's the problem I'm experiencing (using the OpenAI Speech-to-text endpoint):
I'm stuck and cannot use Whisper as is for multilingual audio, because this misidentification of the language occurs too often. I could try to circumvent the issue by detecting the languages in the audio using some other service, and then transcribe chunk by chunk, but this would defeat the purpose of Whisper's multilingual support. Sorry for whining 😅 I do hope it helps! EDIT 1: it's not limited to related languages… testing with Bosnian, I pronounced "Ja sam Fabian." which got transcribed correctly, but with a incorrect value for the detected language: "french". EDIT 2: using openai.Audio.transcribe may still result in content being translated. Testing with Bulgarian, I said in English "How do you ask for a beer in Bulgarian?" and the results I got were text="Как мога да запознаем тебя на булгарски?" and language=English. Had an identical issue with French. I suspect it's introduced by my setting the prompt to the previous line, but it still sounds like a 🪲 bug. Would expect NO translation to take place on the transcribe endpoint. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, how to make the model recognize/detect only 2 languages? i.e bilingual french & english
because with language=None i got mandarin transcription
Beta Was this translation helpful? Give feedback.
All reactions