Hallucinations different model sizes #1452

RikRaesTNO · 2023-06-16T16:22:51Z

RikRaesTNO
Jun 16, 2023

I used Whisper to transcribe the Common Voice data set for a language and note that the 'tiny' model hallucinates a lot, whereas the bigger 'small' model almost does not hallucinate at all, and the even bigger 'base' model hallucinates more than the 'small' model. Furthermore, the general performance of the small model is better than both the tiny and base models. As a side note, the data instances in this data set are sentences worth about 5-10 seconds of audio.

I am mostly interested in your thoughts on why a larger model does not necessarily perform better and may hallucinate more. I did not change any of the temperature or other settings when transcribing. I can imagine a larger model might overfit which can cause this phenomenon but I would like to know what you guys think might be the cause of the lower performance with more hallucinations.

As context: I am doing research for my master thesis so any ideas are welcome!

phineas-pta · 2023-06-16T17:51:02Z

phineas-pta
Jun 16, 2023

one of the main causes of hallucinations should be training data, not model size

also there're various settings for whisper, you cannot know how those settings + model size would affect hallucinations, default values wouldn't always guarantee best transcripts

if you have time + resources, try changing those settings + data (amount of silence) to compare the performance, also testing pre-process audio, that would be better addition to your thesis

0 replies

jwnacnud · 2023-06-16T17:54:03Z

jwnacnud
Jun 16, 2023

In my testing I have found that most hallucinations occur after silence, thus using ASD to remove the silence does a great job of removing the hallucinations. Jeffrey Duncan

…

On Fri, Jun 16, 2023 at 11:51 AM Phan Tuấn Anh ***@***.***> wrote: one of the main causes of hallucinations should be training data, not model size also there're various settings for whisper, you cannot know how those settings + model size would affect hallucinations, default values wouldn't always guarantee best transcripts if you have time + resources, try changing those settings + data (amount of silence) to compare the performance, also testing pre-process audio, that would be better addition to your thesis — Reply to this email directly, view it on GitHub <#1452 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGW5AYGO4UICOYSARHHWCDXLSMJJANCNFSM6AAAAAAZJO7YJE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

phineas-pta Jun 16, 2023

what is ASD ?

jwnacnud · 2023-06-16T20:44:01Z

jwnacnud
Jun 16, 2023

Whoops, I meant VAD (Voice Activity Detector) - you can use it to find the areas of a recording that are silent and trim them out: https://github.com/snakers4/silero-vad

…

On Fri, Jun 16, 2023 at 1:49 PM Phan Tuấn Anh ***@***.***> wrote: what is ASD ? — Reply to this email directly, view it on GitHub <#1452 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGW5A7GHJ76ID5FJS6QQKDXLS2C5ANCNFSM6AAAAAAZJO7YJE> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

RikRaesTNO · 2023-06-17T09:52:10Z

RikRaesTNO
Jun 17, 2023
Author

I agree with pre-processing the data @phineas-pta , would you have other additional pre-processing steps next to removing silences?

Whoops, I meant VAD (Voice Activity Detector) - you can use it to find the areas of a recording that are silent and trim them out: https://github.com/snakers4/silero-vad
…
On Fri, Jun 16, 2023 at 1:49 PM Phan Tuấn Anh @.> wrote: what is ASD ? — Reply to this email directly, view it on GitHub <#1452 (reply in thread)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGW5A7GHJ76ID5FJS6QQKDXLS2C5ANCNFSM6AAAAAAZJO7YJE . You are receiving this because you commented.Message ID: @.>

1 reply

phineas-pta Jun 17, 2023

that's all i know

EtienneAb3d · 2023-06-18T06:41:46Z

EtienneAb3d
Jun 18, 2023

I agree with pre-processing the data @phineas-pta , would you have other additional pre-processing steps next to removing silences?

You should have a look here:
#679

And here:
https://github.com/EtienneAb3d/WhisperHallu/tree/main

;)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hallucinations different model sizes #1452

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Hallucinations different model sizes #1452

Uh oh!

Uh oh!

RikRaesTNO Jun 16, 2023

Replies: 5 comments · 2 replies

Uh oh!

phineas-pta Jun 16, 2023

Uh oh!

jwnacnud Jun 16, 2023

Uh oh!

phineas-pta Jun 16, 2023

Uh oh!

jwnacnud Jun 16, 2023

Uh oh!

RikRaesTNO Jun 17, 2023 Author

Uh oh!

phineas-pta Jun 17, 2023

Uh oh!

EtienneAb3d Jun 18, 2023

RikRaesTNO
Jun 16, 2023

Replies: 5 comments 2 replies

phineas-pta
Jun 16, 2023

jwnacnud
Jun 16, 2023

jwnacnud
Jun 16, 2023

RikRaesTNO
Jun 17, 2023
Author

EtienneAb3d
Jun 18, 2023