An inference runtime with multiple backends supported.
- Easy integration on different platforms using flutter or native cpp, including mobile devices.
- Support inference using different hardware like Qualcomm Hexagon NPU, or general CPU/GPU.
- Provide easy-to-use C apis
- Provide an api server compatible with AI00_server(openai api)
- WebRWKV (WebGPU): Compatible with most PC graphics cards, as well as macOS Metal. Doesn't work on Qualcomm's proprietary Adreno GPU driver though.
- llama.cpp: Run on Android devices with CPU inference.
- ncnn: Initial support for rwkv v6/v7 unquantized models (suitable for running tiny models everywhere).
- Qualcomm Hexagon NPU: Based on Qualcomm's QNN SDK.
- CoreML: (WIP) Running RWKV with Apple Neural Engine. Based on Apple's CoreML framework.
- To be continued...
- Install rust and cargo (for building the web-rwkv backend)
git clone --recursive https://github.com/MollySophia/rwkv-mobile
cd rwkv-mobile && mkdir build && cd build
cmake ..
cmake --build . -j $(nproc)
- Better tensor abstraction for different backends
- Batch inference for all backends