-
I've successfully used prompt to give the recognizer extra context, but so far my attempts to use prefix have resulted in errors. What's the difference between the two, and what kind of data is |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 15 replies
-
Below shows where |
Beta Was this translation helpful? Give feedback.
-
Will the provided prompt affect the transcription all the way through, or just in a time window from the beginning to some given time? I'm trying to transcribe audio that is about six minutes long and contains quite a few non-English names. I've had some success providing the names in variously-formatted prompts, but it only seems to help in the first 90 seconds or so of the six minute video, and any names introduced int the video after that point tend to be wrong. As soon as I get past the 90 second mark, I can't find a prompt that allows whisper to get the names right. Am I misunderstanding, or is there any way to provide context that will persist all the way through the transcription? |
Beta Was this translation helpful? Give feedback.
-
It seems that Even if I pass in I would like to propose a change to the way |
Beta Was this translation helpful? Give feedback.
-
@jongwook is this issue related to an issue I'm seeing related to prompting: I'm passing prompts that look like this in my whisper calls: 80% of the time I use the prompt, I get fully hallucinated output. It ends up on a loop, repeating the same thing (eg. a competitor's name, or a url made up from one of the competitor's names). An example output using the prompt above on an audio file^:
I'm calling whisper from a node backend -- more details on how exactly here. |
Beta Was this translation helpful? Give feedback.
prompt
conditions the model on the text that appeared in the previous ~30 seconds of audio, and in long-form transcription it helps continuing the text in a consistent style, e.g. starting a sentence with a capital letter if the previous context ended with a period. You can also use this for "prompt engineering", to inform the model to become more likely to output certain jargon (" So we were just talking about DALL·E"
) or do a crude form of speaker turn tracking (e.g." - Hey how are you doing? - I'm doing good. How are you?"
, note that the token for" -"
is suppressed by default and will need to be enabled manually.)prefix
accepts a partial transcription for the current audio input, al…