-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
Description
I am following the End-to-end Test section in the README and running the project inside the Docker environment described in the README.
During execution, the script crashes with a FileNotFoundError when attempting to open ../auto_search/8B_search_result_large_btz.json. It seems the pipeline expects a profiling result file that is not present by default.
Environment
- Hardware: NVIDIA H100 80G
- OS: Docker (cuda:12.8.1-cudnn-devel-ubuntu22.04)
Run
python run_llama3.py --load_hf_weight
Error Log
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
W0911 02:35:50.046000 375 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
- W0911 02:35:50.046000 375 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
...
Traceback (most recent call last):
File "/root/Nanoflow/entry/run_llama3.py", line 231, in <module>
test_performance()
File "/root/Nanoflow/entry/run_llama3.py", line 82, in test_performance
pipeline.update(decode_inputs, decode_batch_size, profile_result_path="../auto_search/8B_search_result_large_btz.json", use_cuda_graph=True, use_nano_split=True)
File "/root/Nanoflow/entry/../models/llama3_FlashinferKVCache.py", line 475, in update
with open(profile_result_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../auto_search/8B_search_result_large_btz.json'
Metadata
Metadata
Assignees
Labels
No labels