Skip to content

Conversation

itzmeanjan
Copy link
Owner

Use vulkan compute shaders to offload large matrix multiplication and matrix transposition to GPU
(feature-gated by non-default gpu feature), for speeding up server-setup phase of ChalametPIR.

@itzmeanjan
Copy link
Owner Author

Without gpu feature, server-setup cost on Intel i7-1260P CPU

$ cargo bench --features mutate_internal_client_state --profile optimized --bench offline_phase -q server_setup
Timer precision: 10 ns
offline_phase                                                                        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ server_setup                                                                                    │               │               │               │         │
   ├─ 3                                                                                            │               │               │               │         │
   │  ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  2.522 m       │ 2.648 m       │ 2.585 m       │ 2.585 m       │ 2       │ 2
   ╰─ 4                                                                                            │               │               │               │         │
      ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  2.535 m       │ 2.552 m       │ 2.543 m       │ 2.543 m       │ 2       │ 2

When enabled the gpu feature, server-setup is ~12.45x faster 🚀

$ cargo bench --features mutate_internal_client_state,gpu --profile optimized --bench offline_phase -q server_setup
Timer precision: 10 ns
offline_phase                                                                        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ server_setup                                                                                    │               │               │               │         │
   ├─ 3                                                                                            │               │               │               │         │
   │  ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  12.18 s       │ 12.69 s       │ 12.45 s       │ 12.46 s       │ 25      │ 25
   ╰─ 4                                                                                            │               │               │               │         │
      ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  11.73 s       │ 12.24 s       │ 11.86 s       │ 11.87 s       │ 26      │ 26

@itzmeanjan
Copy link
Owner Author

I benchmarked server-setup on AWS EC2 instance g6e.8xlarge, featuring Nvidia L40S tensor core GPUs.

Server-setup on CPU

server-setup-on-cpu

Server-setup, partially offloaded to GPU

server-setup-on-gpu

Note

Server-setup can be offloaded to GPU, by enabling feature gpu. You need to install Vulkan drivers and library for this feature to work.

@itzmeanjan itzmeanjan merged commit 0646d4e into main Apr 6, 2025
5 checks passed
@itzmeanjan itzmeanjan deleted the integrate-mat-mul-on-gpu branch April 6, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant