Support for loading models from std::basic_streambuff rather than files #1314
RodDaSilvaWCO
started this conversation in
New features / APIs
Replies: 1 comment
-
Yes, something like this would be highly beneficial. I am currently utilizing BoxedAppSDK to manage this scenario, and it has proven to be highly effective. It is a commercial library, but in the interim, this may serve as a viable solution for you as well. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The ORT is purpose built for allowing the efficient execution of local LLMs. To enable a wider set of use cases, there needs to be more flexibility in how those LLMs are loaded into the runtime.
Currently, to load an LLM into the ORT you typically call:
This api takes the path to the folder containing Onnx model files. For example in the case of Phi3.5-mini-instruct:
2025-01-29 10:09 PM 3,589 config.json
2025-01-29 10:09 PM 11,153 configuration_phi3.py
2025-01-29 10:09 PM 1,632 genai_config.json
2025-01-29 10:09 PM 52,176,615 phi-3.5-mini-instruct-cpu-int4-awq-block-128-acc-level-4.onnx
2025-01-29 10:10 PM 2,728,144,896 phi-3.5-mini-instruct-cpu-int4-awq-block-128-acc-level-4.onnx.data
2025-01-29 10:09 PM 599 special_tokens_map.json
2025-01-29 10:09 PM 1,937,898 tokenizer.json
2025-01-29 10:09 PM 3,495 tokenizer_config.json
This works great if you want to load the LLM from the local filesystem, but leaves out a bunch of other desirable uses cases such as loading the LLM from a database, a RAM disk, or via a socket over the network to name just 3 alternatives to the local file system.
It would be great if you guys could provide additional lower-level API(s) that would allow the LLM to be loaded from a "stream abstraction" instead of assuming the local file system:
Obviously, the above is an over simplification since you would need to provide streams for all of the input files you are currently expecting (e.g.; genai_config.json, *.onnx, *.onnx.data, tokenizer.json, tokenizer_config.json, config.json, etc.,) but you get the idea.
Giving developers control of "where" the requisite LLM model "streams" come from would expand the range of ways the models can be delivered and utilized in client applications.
Thanks for the consideration.
Beta Was this translation helpful? Give feedback.
All reactions