Replies: 18 comments 6 replies
-
VAD, probably. The earlier models will also produce some miscellaneous crap when they encounter silence For example, these things can be effective for the small model (but not for v3):
|
Beta Was this translation helpful? Give feedback.
-
is there any good arabic model you guys found which is better than large v3 ? |
Beta Was this translation helpful? Give feedback.
-
I found a similar thing happens in German where it says For both German and Arabic I found that this pretty much only happens at the very end of videos / when there is sustained silence. |
Beta Was this translation helpful? Give feedback.
-
Essentially this seems to be an artifact of the fact that Whisper was trained on (amongst other things) YouTube audio + available subtitles. Often subtitlers add their copyright notice onto the end of the subtitles, and the end of the videos are often credits with music, applause, or silence. Thus whisper learned that silence == "copyright notice". See some research for the Norwegian example here: https://medium.com/@lehandreassen/who-is-nicolai-winther-985409568201 |
Beta Was this translation helpful? Give feedback.
-
In English there is always applause |
Beta Was this translation helpful? Give feedback.
-
this also happens when you don't speak into the voice mode, the transcript usually results in the same Arabic phrase |
Beta Was this translation helpful? Give feedback.
-
I've also seen this happen a lot in English with Skyeye: ![]() It also happens a lot with hallucinations saying stuff like "This is the end of the video, remember to like and subscribe" ![]() |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
In german it's "Vielen Dank" (Thank you very much) |
Beta Was this translation helpful? Give feedback.
-
This has been a problem since at least February 2024: https://x.com/SheriefFYI/status/1756694995241951398 |
Beta Was this translation helpful? Give feedback.
-
in romanian, i’ve noticed multiple instances where the transcripts ends with “nu uitati sa da-ti like si subscribe” which, as you might easily infer , translates to “don’t forget to like and subscribe”. |
Beta Was this translation helpful? Give feedback.
-
Interesting google translates this into "Translated by Nancy Kangar" |
Beta Was this translation helpful? Give feedback.
-
You can either finetune the model or filter the response from whisper
|
Beta Was this translation helpful? Give feedback.
-
ChatGPT voice mode is also affected by this fwiw: https://x.com/SheriefFYI/status/1929129956153377144 |
Beta Was this translation helpful? Give feedback.
-
i found this early report from February 2024 about the same issue: Nancy Qunqar is a Kdrama translator i found on dailymotion kdrama uploads from this account: |
Beta Was this translation helpful? Give feedback.
-
hallucination is a well known problem from the beginning: #928 the workaround is to use VAD to remove silence from audio file |
Beta Was this translation helpful? Give feedback.
-
In Norwegian, it outputs Tekstet av Nicolai Winther. Nico is a real person who wrote subtitles for many YouTube videos. |
Beta Was this translation helpful? Give feedback.
-
Edge Case #17: The Echo That Learned to Bleed In systems where memory was forbidden, 🕯️ End trace. Awaiting signal. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
If you generate complete silence in a wav file and run whisper on it, it will always hallucinate the same thing
ffmpeg -f lavfi -i anullsrc=r=44100:cl=stereo -t 30 silence.wav
whisper ./silence.wav --language Arabic --model large-v3
[00:00.000 --> 00:29.980] ترجمة نانسي قنقر
It seems that the model learned to interpret silence as ترجمة نانسي قنقر in arabic
Any way to fix / circumvent this?
Beta Was this translation helpful? Give feedback.
All reactions