Simple Offline LLM OFFICIAL SALE

SimpleOfflineLLM is a package that will let you run a Large Language Model locally within Unity using the official Unity Inference package. It's completely offline without any networking or services required.


It does not have any platform-specific code or libraries, so can be used on any platform supported by the Inference package (Windows, Mac, WebGL and Android tested). Inference can be performed on the CPU or GPU.


LLM models can be extremely large (10GB+), so download links are provided to acquire these separately. After download, they can be quantized to reduce the size to approximately half (fp16) or one quarter (uint8). It has been tested with several popular LLMs: Phi 1.5, Phi 3.5, SmolLM2, and gpt2. It is possible to extend support to other LLMs either yourself or in future releases of the asset.


The goal was to build a relatively thin interface between Unity and freely available ONNX models, allowing you to customise your use of the LLMs. You are free to take the sample code which provides a GUI dialogue with the model, or build your own API around the LLM classes.