You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android.md
You should now have `llama_main` available for Android.
84
85
86
+
{{% notice Note %}}
87
+
If you notice that Gradle cannot find the Android SDK, add the sdk.dir path to executorch/extension/android/local.properties.
88
+
{{% /notice %}}
89
+
85
90
## Run on Android via adb shell
86
91
You will need an Arm-powered smartphone with the i8mm feature running Android, with 16GB of RAM. The following steps were tested on a Google Pixel 8 Pro phone.
87
92
@@ -103,7 +108,7 @@ You should see your device listed to confirm it is connected.
Use the Llama runner to execute the model on the phone with the `adb` command:
115
120
116
121
```bash
117
-
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5
122
+
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5"
118
123
```
119
124
120
125
The output should look something like this.
121
126
122
127
```
123
-
I 00:00:00.003316 executorch:main.cpp:69] Resetting threadpool with num threads = 5
124
-
I 00:00:00.009329 executorch:runner.cpp:59] Creating LLaMa runner: model_path=llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model
125
-
I 00:00:03.569399 executorch:runner.cpp:88] Reading metadata from model
126
-
I 00:00:03.569451 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 1
127
-
I 00:00:03.569455 executorch:runner.cpp:113] Metadata: use_kv_cache = 1
128
-
I 00:00:03.569459 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256
129
-
I 00:00:03.569461 executorch:runner.cpp:113] Metadata: get_bos_id = 128000
130
-
I 00:00:03.569464 executorch:runner.cpp:113] Metadata: get_max_seq_len = 1024
131
-
I 00:00:03.569466 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 1
132
-
I 00:00:03.569469 executorch:runner.cpp:120] eos_id = 128009
133
-
I 00:00:03.569470 executorch:runner.cpp:120] eos_id = 128001
134
-
I 00:00:03.569471 executorch:runner.cpp:120] eos_id = 128006
135
-
I 00:00:03.569473 executorch:runner.cpp:120] eos_id = 128007
136
-
I 00:00:03.569475 executorch:runner.cpp:168] Doing a warmup run...
137
-
I 00:00:03.838634 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
138
-
139
-
I 00:00:03.892268 executorch:text_token_generator.h:118]
128
+
I tokenizers:regex.cpp:27] Registering override fallback regex
129
+
I 00:00:00.003288 executorch:main.cpp:87] Resetting threadpool with num threads = 5
130
+
I 00:00:00.006393 executorch:runner.cpp:44] Creating LLaMa runner: model_path=llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model
131
+
E tokenizers:hf_tokenizer.cpp:60] Error parsing json file: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: 'I'
132
+
I 00:00:00.131486 executorch:llm_runner_helper.cpp:57] Loaded TikToken tokenizer
133
+
I 00:00:00.131525 executorch:llm_runner_helper.cpp:167] Reading metadata from model
134
+
I 00:00:00.186538 executorch:llm_runner_helper.cpp:110] Metadata: use_sdpa_with_kv_cache = 1
135
+
I 00:00:00.186574 executorch:llm_runner_helper.cpp:110] Metadata: use_kv_cache = 1
136
+
I 00:00:00.186578 executorch:llm_runner_helper.cpp:110] Metadata: get_max_context_len = 1024
137
+
I 00:00:00.186584 executorch:llm_runner_helper.cpp:110] Metadata: get_max_seq_len = 1024
138
+
I 00:00:00.186588 executorch:llm_runner_helper.cpp:110] Metadata: enable_dynamic_shape = 1
139
+
I 00:00:00.186596 executorch:llm_runner_helper.cpp:140] eos_id = 128009
140
+
I 00:00:00.186597 executorch:llm_runner_helper.cpp:140] eos_id = 128001
141
+
I 00:00:00.186599 executorch:llm_runner_helper.cpp:140] eos_id = 128006
142
+
I 00:00:00.186600 executorch:llm_runner_helper.cpp:140] eos_id = 128007
143
+
I 00:00:01.086570 executorch:text_llm_runner.cpp:89] Doing a warmup run...
144
+
I 00:00:01.087836 executorch:text_llm_runner.cpp:152] Max new tokens resolved: 128, given start_pos 0, num_prompt_tokens 54, max_context_len 1024
145
+
I 00:00:01.292740 executorch:text_prefiller.cpp:93] Prefill token result numel(): 128256
146
+
147
+
I 00:00:02.264371 executorch:text_token_generator.h:123]
140
148
Reached to the end of generation
141
-
I 00:00:03.892281 executorch:runner.cpp:267] Warmup run finished!
142
-
I 00:00:03.892286 executorch:runner.cpp:174] RSS after loading model: 1269.445312 MiB (0 if unsupported)
143
-
<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:04.076905 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
144
-
145
-
146
-
I 00:00:04.078027 executorch:runner.cpp:243] RSS after prompt prefill: 1269.445312 MiB (0 if unsupported)
147
-
I'm doing great, thanks! I'm always happy to help, communicate, and provide helpful responses. I'm a bit of a cookie (heh) when it comes to delivering concise and precise answers. What can I help you with today?<|eot_id|>
148
-
I 00:00:05.399304 executorch:text_token_generator.h:118]
149
+
I 00:00:02.264379 executorch:text_llm_runner.cpp:209] Warmup run finished!
150
+
I 00:00:02.264384 executorch:text_llm_runner.cpp:95] RSS after loading model: 1122.187500 MiB (0 if unsupported)
151
+
I 00:00:02.264624 executorch:text_llm_runner.cpp:152] Max new tokens resolved: 74, given start_pos 0, num_prompt_tokens 54, max_context_len 1024
152
+
<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:02.394162 executorch:text_prefiller.cpp:93] Prefill token result numel(): 128256
153
+
154
+
155
+
I 00:00:02.394373 executorch:text_llm_runner.cpp:179] RSS after prompt prefill: 1122.187500 MiB (0 if unsupported)
156
+
I'm doing great, thanks for asking! I'm always ready to help, whether it's answering a question or providing a solution. What can I help you with today?<|eot_id|>
157
+
I 00:00:03.072966 executorch:text_token_generator.h:123]
149
158
Reached to the end of generation
150
-
151
-
I 00:00:05.399314 executorch:runner.cpp:257] RSS after finishing text generation: 1269.445312 MiB (0 if unsupported)
0 commit comments