Shouldn't FP16 = True give faster transcription time ? #622

Ca-ressemble-a-du-fake · 2022-12-01T14:30:43Z

Ca-ressemble-a-du-fake
Dec 1, 2022

Hi,

I am surprised because I would expect (see Nvidia explanations) that running transcribe with half precision (see transcribe(..., fp16=True) ) would run faster than with normal precision (default parameter), but the contrary happens.
I did the experiment on an RTX 3090 with the large model and got around 45 s with default parameter (fp16=False so fp32) but got respectively 135 and 104 s with half precision mode.

Is this what was expected or it comes from the old cpu architecture that cannot cope with higher throughput ?

Thanks in advance for any hint !

Jeronymous · 2023-04-06T16:32:41Z

Jeronymous
Apr 6, 2023

I also observe that there is no real improvement when using FP16 on a V100.

So I suspect something is missing for the "real" support of precision float16.

In details, what I see is:

The amount of memory used (peak) is the same.
The speed with fp16=False is the same if not better than fp16=True.

Here is a graph showing this:

Caution: the inverse of the RTF (Real Time Factor) are represented on top. So the higher the better.
(and the last two rows corresponds to memory usages)
Plain bars ("float32") corresponds to fp16=False
Hatched bars ("float16") corresponds to fp16=True
I tested with 3 audio :
- a very small one (1.2 sec) that produces the worst RTF
- one of 36 sec with a lot of silence, that produces the best RTF
- one of 5 min 22, that produces median RTF
I tested 4 model sizes (tiny/small/medium/large) and 2 modes of decoding :
- "beam search" is just with default options,
- "greedy" corresponds to options beam_size= None, best_of= None, temperature= 0.

So only for beam search decoding with large model I saw an improvement with fp16=True. Which is maybe not even significant...
Sometimes fp16=False gives better results.

0 replies

Jeronymous · 2023-04-06T16:51:18Z

Jeronymous
Apr 6, 2023

Oh, and I tried to convert the model to fp16

import whisper
import torch
model = whisper.load_model("small", device="cuda")
model = model.to(torch.float16) # or model.half()

but then the decoding just fails when I call model.transcribe(..., fp16=True):

  File "/usr/local/lib/python3.9/site-packages/whisper/decoding.py", line 640, in _get_audio_features
    audio_features = self.model.encoder(mel)
  File "/home/jlouradour/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/whisper/model.py", line 170, in forward
    x = block(x)
  File "/home/jlouradour/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/whisper/model.py", line 136, in forward
    x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)[0]
  File "/home/jlouradour/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/whisper/model.py", line 32, in forward
    return super().forward(x.float()).type(x.dtype)
  File "/home/jlouradour/.local/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/jlouradour/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

0 replies

phineas-pta · 2023-04-13T11:28:33Z

phineas-pta
Apr 13, 2023

exactly, fp16 is misleading, it doesn't load the model as float16

i asked a question #1175 but no answer

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shouldn't FP16 = True give faster transcription time ? #622

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Shouldn't FP16 = True give faster transcription time ? #622

Uh oh!

Ca-ressemble-a-du-fake Dec 1, 2022

Replies: 3 comments

Uh oh!

Uh oh!

Jeronymous Apr 6, 2023

Uh oh!

Uh oh!

Jeronymous Apr 6, 2023

Uh oh!

phineas-pta Apr 13, 2023

Ca-ressemble-a-du-fake
Dec 1, 2022

Jeronymous
Apr 6, 2023

Jeronymous
Apr 6, 2023

phineas-pta
Apr 13, 2023