Skip to content

Dual GPU performance regression After #4606 #5324

Closed
@Ph0rk0z

Description

@Ph0rk0z

A while ago, on 2x3090 I would get 18.x tokens/s on 70b models. I didn't update for a bit and was dismayed to see performance dip down to 15t/s. I had some HW issues so it took a while to figure out what's going on, but I narrowed it down to a commit between:
7082d24 and f679349

Reading through what happened in that week, the most likely culprits look to be 5bf3953 and dc68f00

The first one I can't check against because it produced errors in multi-gpu which the second commit fixed. I can run versions from before this and my performance is back.

link the pulls: #4606 #4620

Loading a model over 3 GPU, like miqu 5km, the regression is even bigger. From 15.5t/s down to 11 t/s. Memory use is improved though. I had to re-arrange how I split the model.

Some proof:

Pre:

llama_print_timings:        load time =     528.83 ms
llama_print_timings:      sample time =     112.26 ms /   200 runs   (    0.56 ms per token,  1781.55 tokens per second)
llama_print_timings: prompt eval time =     528.67 ms /    22 tokens (   24.03 ms per token,    41.61 tokens per second)
llama_print_timings:        eval time =   10762.82 ms /   199 runs   (   54.08 ms per token,    18.49 tokens per second)
llama_print_timings:       total time =   11874.81 ms
Output generated in 12.77 seconds (15.66 tokens/s, 200 tokens, context 22, seed 1952269572

Post:

llama_print_timings:        load time =     495.04 ms
llama_print_timings:      sample time =     113.32 ms /   200 runs   (    0.57 ms per token,  1764.90 tokens per second)
llama_print_timings: prompt eval time =     494.91 ms /    22 tokens (   22.50 ms per token,    44.45 tokens per second)
llama_print_timings:        eval time =   12894.68 ms /   199 runs   (   64.80 ms per token,    15.43 tokens per second)
llama_print_timings:       total time =   14055.05 ms /   221 tokens
Output generated in 14.63 seconds (13.67 tokens/s, 200 tokens, context 22, seed 1842804206)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions