Skip to content

Conversation

@TroyGarden
Copy link
Contributor

@TroyGarden TroyGarden commented Dec 22, 2025

Summary:

  • add a script to run train pipeline benchmark
  • add a github workflow which can run benchmark script nightly
  • the workflow can also be triggered manually
  • trace and memory snapshot will be uploaded to github artifacts
  • also fix some github workflow naming conventions and typos.

NOTE: github runner linux.g5.12xlarge.nvidia.gpu only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks

Python 3.13.11
torch 2.11.0.dev20251222+cu128
fbgemm_gpu 2025.12.22+cu128
torchrec 1.5.0a0+623f0dc
Tue Dec 23 05:41:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10G                    On  |   00000000:00:1B.0 Off |                    0 |
|  0%   19C    P8             16W /  300W |       0MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A10G                    On  |   00000000:00:1C.0 Off |                    0 |
|  0%   20C    P8             20W /  300W |       0MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A10G                    On  |   00000000:00:1D.0 Off |                    0 |
|  0%   19C    P8             19W /  300W |       0MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
|  0%   19C    P8             19W /  300W |       0MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
torchrec directory: /meta-pytorch/torchrec/torchrec
working directory: /meta-pytorch/torchrec
short name GPU Runtime (P90) CPU Runtime (P90) GPU Peak Mem alloc (P90) GPU Peak Mem reserved (P90) GPU Mem used (P90) Malloc retries (P50/P90/P100) CPU Peak RSS (P90)
base_pipeline_light 8213.92 ms 7705.65 ms 11.85 GB 16.71 GB 17.23 GB 0.0 / 0.0 / 0.0 3.45 GB
sparse_data_dist_light 7909.53 ms 7221.77 ms 12.48 GB 18.33 GB 18.86 GB 0.0 / 0.0 / 0.0 3.48 GB

Differential Revision: D89629829

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 22, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Dec 22, 2025

@TroyGarden has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89629829.

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context

Differential Revision: D89629829
@TroyGarden TroyGarden changed the title debug create github benchmark workflow Dec 23, 2025
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
@TroyGarden TroyGarden force-pushed the export-D89629829 branch 2 times, most recently from a7586a2 to fab9707 Compare December 23, 2025 06:35
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
@TroyGarden TroyGarden force-pushed the export-D89629829 branch 2 times, most recently from b7719a0 to d691912 Compare December 23, 2025 07:47
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
@TroyGarden TroyGarden force-pushed the export-D89629829 branch 2 times, most recently from 60c3237 to 66b06e1 Compare December 23, 2025 16:17
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* also fix some github workflow naming conventions and typos.

Differential Revision: D89629829
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 23, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* trace and memory snapshot will be uploaded to github artifacts
* also fix some github workflow naming conventions and typos.

NOTE: github runner `linux.g5.12xlarge.nvidia.gpu` only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks

Differential Revision: D89629829
@TroyGarden TroyGarden force-pushed the export-D89629829 branch 2 times, most recently from 864ff2c to a9d0b6f Compare December 24, 2025 00:56
TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request Dec 24, 2025
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* trace and memory snapshot will be uploaded to github artifacts
* also fix some github workflow naming conventions and typos.

NOTE: github runner `linux.g5.12xlarge.nvidia.gpu` only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks

Differential Revision: D89629829
Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* trace and memory snapshot will be uploaded to github artifacts
* also fix some github workflow naming conventions and typos.

NOTE: github runner `linux.g5.12xlarge.nvidia.gpu` only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks

Differential Revision: D89629829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant