Skip to content

Commit 8fc17e6

Browse files
committed
Update README.md
1 parent 3b91d75 commit 8fc17e6

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ It has been tested on Radeon VII (aka gfx906), MI250X (aka gfx90a), and 7900 XTX
55

66
## Performance
77

8-
Updating soon..
8+
~200000 tok/s for the smallest GPT2 model on a 4x7900 XTX.
9+
10+
This is approximately on par with PyTorch 2.4.0 *without* flash attention but using all other go fast options like compile (I'm not aware of any publicly available implementation of flash attention for RDNA3), however PyTorch 2.4.0 with all options to go fast (bf16, flash attention, compile etc) is running at about ~245,000 tok/s.
911

1012
## Quick Start (AMD targets)
1113

0 commit comments

Comments
 (0)