Skip to content

Commit 7963c7e

Browse files
authored
Merge branch 'master' into arjunsuresh-patch-2
2 parents 9b2d0cc + e064826 commit 7963c7e

File tree

3 files changed

+86
-75
lines changed

3 files changed

+86
-75
lines changed

language/deepseek-r1/README.md

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Mlperf Inference DeepSeek Reference Implementation
1+
# MLPerf Inference DeepSeek Reference Implementation
22

3-
## Automated command to run the benchmark via MLFlow
3+
## Automated command to run the benchmark via MLCFlow
44

55
Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/deepseek-r1/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
66

@@ -13,6 +13,22 @@ You can also do pip install mlc-scripts and then use `mlcr` commands for downloa
1313
- DeepSeek-R1 model is automatically downloaded as part of setup
1414
- Checkpoint conversion is done transparently when needed.
1515

16+
**Using the MLC R2 Downloader**
17+
18+
Download the model using the MLCommons R2 Downloader:
19+
20+
```bash
21+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
22+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
23+
```
24+
25+
To specify a custom download directory, use the `-d` flag:
26+
```bash
27+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
28+
-d /path/to/download/directory \
29+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
30+
```
31+
1632
## Dataset Download
1733

1834
The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livecodebench(code_generation_lite). They are covered by the following licenses:
@@ -23,49 +39,40 @@ The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livec
2339
- MMLU-Pro: [MIT](https://opensource.org/license/mit)
2440
- livecodebench(code_generation_lite): [CC](https://creativecommons.org/share-your-work/cclicenses/)
2541

26-
### Preprocessed
27-
28-
**Using MLCFlow Automation**
29-
30-
```
31-
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
32-
```
42+
### Preprocessed & Calibration
3343

34-
**Using Native method**
44+
**Using the MLC R2 Downloader**
3545

36-
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
46+
Download the full preprocessed dataset and calibration dataset using the MLCommons R2 Downloader:
3747

38-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
39-
To install Rclone on Linux/macOS/BSD systems, run:
40-
```
41-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
42-
```
43-
Once Rclone is installed, run the following command to authenticate with the bucket:
44-
```
45-
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
48+
```bash
49+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
50+
-d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
4651
```
47-
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
4852

49-
```
50-
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl ./ -P
53+
This will download the full preprocessed dataset file (`mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`) and the calibration dataset file (`mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`).
54+
55+
To specify a custom download directory, use the `-d` flag:
56+
```bash
57+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
58+
-d /path/to/download/directory \
59+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
5160
```
5261

53-
### Calibration
62+
### Preprocessed
5463

5564
**Using MLCFlow Automation**
5665

5766
```
58-
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
67+
mlcr get,preprocessed,dataset,deepseek-r1,_validation,_mlc,_r2-downloader --outdirname=<path to download> -j
5968
```
6069

61-
**Using Native method**
62-
63-
Download and install Rclone as described in the previous section.
70+
### Calibration
6471

65-
Then navigate in the terminal to your desired download directory and run the following command to download the dataset:
72+
**Using MLCFlow Automation**
6673

6774
```
68-
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl ./ -P
75+
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_r2-downloader --outdirname=<path to download> -j
6976
```
7077

7178
## Docker
@@ -204,7 +211,7 @@ The following table shows which backends support different evaluation and MLPerf
204211
**Using MLCFlow Automation**
205212

206213
```
207-
TBD
214+
mlcr run,accuracy,mlperf,_dataset_deepseek-r1 --result_dir=<Path to directory where files are generated after the benchmark run>
208215
```
209216

210217
**Using Native method**

language/llama3.1-8b/README.md

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ You need to request for access to [MLCommons](http://llama3-1.mlcommons.org/) an
104104
**Official Model download using MLCFlow Automation**
105105
You can download the model automatically via the below command
106106
```
107-
TBD
107+
mlcr get,ml-model,llama3,_mlc,_8b,_r2-downloader --outdirname=<path to download> -j
108108
```
109109

110110

@@ -137,59 +137,57 @@ Downloading llama3.1-8b model from Hugging Face will require an [**access token*
137137

138138
### Preprocessed
139139

140-
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
141-
142-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
143-
To install Rclone on Linux/macOS/BSD systems, run:
144-
```
145-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
146-
```
147-
Once Rclone is installed, run the following command to authenticate with the bucket:
148-
```
149-
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
150-
```
151-
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
140+
Download the preprocessed datasets using the MLCommons downloader:
152141

153142
#### Full dataset (datacenter)
154143

155144
**Using MLCFlow Automation**
156145
```
157-
mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname=<path to download> -j
146+
mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_r2-downloader --outdirname=<path to download> -j
158147
```
159148

160149
**Native method**
150+
```bash
151+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
152+
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-eval.uri
161153
```
162-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval.json ./ -P
163-
```
154+
This will download `cnn_eval.json`.
164155

165156
#### 5000 samples (edge)
166157

167158
**Using MLCFlow Automation**
168159
```
169-
mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path to download> -j
160+
mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_r2-downloader --outdirname=<path to download> -j
170161
```
171162

172163
**Native method**
164+
```bash
165+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
166+
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-sample-cnn-eval-5000.uri
173167
```
174-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval_5000.json ./ -P
175-
```
168+
169+
This will download `sample_cnn_eval_5000.json`.
170+
176171

177172
#### Calibration
178173

179174
**Using MLCFlow Automation**
180175
```
181-
mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to download> -j
176+
mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_r2-downloader --outdirname=<path to download> -j
182177
```
183178

184179
**Native method**
180+
```bash
181+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
182+
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-dailymail-calibration.uri
185183
```
186-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P
187-
```
188-
189-
You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:
184+
This will download `cnn_dailymail_calibration.json`.
190185

191-
```
192-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/cnn_eval.json ./ -P
186+
To specify a custom download directory for any of these, use the `-d` flag:
187+
```bash
188+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
189+
-d /path/to/download/directory \
190+
<URI>
193191
```
194192

195193

speech2text/README.md

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -102,26 +102,24 @@ VLLM_TARGET_DEVICE=cpu pip install --break-system-packages . --no-build-isolatio
102102

103103
You can download the model automatically via the below command
104104
```
105-
mlcr get,ml-model,whisper,_rclone,_mlc --outdirname=<path_to_download> -j
105+
mlcr get,ml-model,whisper,_r2-downloader,_mlc --outdirname=<path_to_download> -j
106106
```
107107

108-
**Official Model download using native method**
108+
**Official Model download using MLC R2 Downloader**
109109

110-
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
110+
Download the Whisper model using the MLCommons downloader:
111111

112-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
113-
To install Rclone on Linux/macOS/BSD systems, run:
114-
```
115-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
116-
```
117-
Once Rclone is installed, run the following command to authenticate with the bucket:
118-
```
119-
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
112+
```bash
113+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/model https://inference.mlcommons-storage.org/metadata/whisper-model.uri
120114
```
121-
You can then navigate in the terminal to your desired download directory and run the following command to download the model:
122115

123-
```
124-
rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/model/ ./ -P
116+
This will download the Whisper model files.
117+
118+
To specify a custom download directory, use the `-d` flag:
119+
```bash
120+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
121+
-d /path/to/download/directory \
122+
https://inference.mlcommons-storage.org/metadata/whisper-model.uri
125123
```
126124

127125
### External Download (Not recommended for official submission)
@@ -153,16 +151,24 @@ We use dev-clean and dev-other splits, which are approximately 10 hours.
153151

154152
**Using MLCFlow Automation**
155153
```
156-
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
154+
mlcr get,dataset,whisper,_preprocessed,_mlc,_r2-downloader --outdirname=<path to download> -j
157155
```
158156

159-
**Native method**
157+
**Using MLC R2 Downloader**
160158

161-
Download and install rclone as decribed in the [MLCommons Download section](#mlcommons-download)
159+
Download the preprocessed dataset using the MLCommons R2 Downloader:
162160

163-
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
161+
```bash
162+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/dataset https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri
164163
```
165-
rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/dataset/ ./ -P
164+
165+
This will download the LibriSpeech dataset files.
166+
167+
To specify a custom download directory, use the `-d` flag:
168+
```bash
169+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
170+
-d /path/to/download/directory \
171+
https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri
166172
```
167173

168174
### Unprocessed

0 commit comments

Comments
 (0)