Make language detection only includes 2 languages? #849

junkoran · 2023-01-16T04:52:18Z

junkoran
Jan 16, 2023

Hi, how to make the model recognize/detect only 2 languages? i.e bilingual french & english
because with language=None i got mandarin transcription

glangford · 2023-01-16T13:33:34Z

glangford
Jan 16, 2023

This code fragment below from a different discussion uses a lower-level API to select the most probable language (based on probs). If the first 30s of your audio is representative of the whole file, you could use code like this to force whisper to use the more probable of your two languages:

Identifying non-English audio files #436 (comment)

If the first 30s of your audio is noisy or not representative of your two languages then there is a different problem. See the description of this pull request

Improve language detection #676

1 reply

glangford Jan 16, 2023

...if your audio has both English and French in the same file, then you need to transcribe twice, once for each language. From the command line, you can force en or fr with --language

fabswt · 2023-04-12T08:43:52Z

fabswt
Apr 12, 2023

Being able to provide a list of languages to language (instead of either nothing or a single language) would be a very useful addition.

My use case is bilingual:

a virtual language teacher.
I'm switching from Azure's Speech-to-text (which is monolingual) because I want to let people ask things in their source/native language about the target language.

Similarly to @junkoran here's the problem I'm experiencing (using the OpenAI Speech-to-text endpoint):

if someone speaks, say, Serbo-Croatian, Whisper will understand it as Polish;
tweaking the prompt sometimes gets me a better transcription, but with format=verbose_json I can see that it's still Polish that's being detected. Things I've tried with the prompt:
- Saying which languages are used (in English)
- Saying which languages are used (in the target language)
- Adding the previous sentence. This is what worked best, but even still too many errors and the language name is wrong.
(I suspect languages with low WER are more likely to be mistaken for higher-WER languages.)

I'm stuck and cannot use Whisper as is for multilingual audio, because this misidentification of the language occurs too often.

I could try to circumvent the issue by detecting the languages in the audio using some other service, and then transcribe chunk by chunk, but this would defeat the purpose of Whisper's multilingual support.

Sorry for whining 😅 I do hope it helps!

EDIT 1: it's not limited to related languages… testing with Bosnian, I pronounced "Ja sam Fabian." which got transcribed correctly, but with a incorrect value for the detected language: "french".

EDIT 2: using openai.Audio.transcribe may still result in content being translated. Testing with Bulgarian, I said in English "How do you ask for a beer in Bulgarian?" and the results I got were text="Как мога да запознаем тебя на булгарски?" and language=English. Had an identical issue with French. I suspect it's introduced by my setting the prompt to the previous line, but it still sounds like a 🪲 bug. Would expect NO translation to take place on the transcribe endpoint.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make language detection only includes 2 languages? #849

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Make language detection only includes 2 languages? #849

Uh oh!

junkoran Jan 16, 2023

Replies: 2 comments · 1 reply

Uh oh!

glangford Jan 16, 2023

Uh oh!

Uh oh!

glangford Jan 16, 2023

Uh oh!

Uh oh!

fabswt Apr 12, 2023

junkoran
Jan 16, 2023

Replies: 2 comments 1 reply

glangford
Jan 16, 2023

fabswt
Apr 12, 2023