Support for loading models from std::basic_streambuff rather than files #1314

RodDaSilvaWCO · 2025-03-08T17:55:23Z

RodDaSilvaWCO
Mar 8, 2025

Hi,

The ORT is purpose built for allowing the efficient execution of local LLMs. To enable a wider set of use cases, there needs to be more flexibility in how those LLMs are loaded into the runtime.

Currently, to load an LLM into the ORT you typically call:

// ort_genai_c.cpp
OgaResult* OGA_API_CALL OgaCreateModel(const char* config_path, OgaModel** out)

This api takes the path to the folder containing Onnx model files. For example in the case of Phi3.5-mini-instruct:

2025-01-29 10:09 PM 3,589 config.json
2025-01-29 10:09 PM 11,153 configuration_phi3.py
2025-01-29 10:09 PM 1,632 genai_config.json
2025-01-29 10:09 PM 52,176,615 phi-3.5-mini-instruct-cpu-int4-awq-block-128-acc-level-4.onnx
2025-01-29 10:10 PM 2,728,144,896 phi-3.5-mini-instruct-cpu-int4-awq-block-128-acc-level-4.onnx.data
2025-01-29 10:09 PM 599 special_tokens_map.json
2025-01-29 10:09 PM 1,937,898 tokenizer.json
2025-01-29 10:09 PM 3,495 tokenizer_config.json

This works great if you want to load the LLM from the local filesystem, but leaves out a bunch of other desirable uses cases such as loading the LLM from a database, a RAM disk, or via a socket over the network to name just 3 alternatives to the local file system.

It would be great if you guys could provide additional lower-level API(s) that would allow the LLM to be loaded from a "stream abstraction" instead of assuming the local file system:

// ort_genai_c.cpp
OgaResult* OGA_API_CALL OgaCreateModel(const std::basic_streambuf* config_stream, OgaModel** out)

Obviously, the above is an over simplification since you would need to provide streams for all of the input files you are currently expecting (e.g.; genai_config.json, *.onnx, *.onnx.data, tokenizer.json, tokenizer_config.json, config.json, etc.,) but you get the idea.

Giving developers control of "where" the requisite LLM model "streams" come from would expand the range of ways the models can be delivered and utilized in client applications.

Thanks for the consideration.

jarroddavis68 · 2025-03-08T23:27:56Z

jarroddavis68
Mar 8, 2025

Yes, something like this would be highly beneficial. I am currently utilizing BoxedAppSDK to manage this scenario, and it has proven to be highly effective. It is a commercial library, but in the interim, this may serve as a viable solution for you as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for loading models from std::basic_streambuff rather than files #1314

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Support for loading models from std::basic_streambuff rather than files #1314

Uh oh!

Uh oh!

RodDaSilvaWCO Mar 8, 2025

Replies: 1 comment

Uh oh!

Uh oh!

jarroddavis68 Mar 8, 2025

RodDaSilvaWCO
Mar 8, 2025

jarroddavis68
Mar 8, 2025