Floneum Floneum makes it easy to develop applications that use local pre-trained AI models. There are two projects in this repository: Kalosm : A simple interface for pre-trained models in rust Floneum Editor (preview) : A graphical editor for local AI workflows. See the user documentation or plugin documentation for more information. Kalosm Kalosm is a simple interface for pre-trained models in Rust that backs Floneum. It makes it easy to interact with pre-trained, language, audio, and image models. Model Support Kalosm supports a variety of models. Here is a list of the models that are currently supported: Model Modality Size Description Quantized CUDA + Metal Accelerated Example Llama Text 1b-70b General purpose language model ? ? llama 3 chat Mistral Text 7-13b General purpose language model ? ? mistral chat Phi Text 2b-4b Small reasoning focused language model ? ? phi 3 chat Whisper Audio 20MB-1GB Audio transcription model ? ? live whisper transcription RWuerstchen Image 5gb Image generation model ? ? rwuerstchen image generation TrOcr Image 3gb Optical character recognition model ? ? Text Recognition Segment Anything Image 50MB-400MB Image segmentation model ? ? Image Segmentation Bert Text 100MB-1GB Text embedding model ? ? Semantic Search Utilities Kalosm also supports a variety of utilities around pre-trained models. These include: Extracting, formatting and retrieving context for LLMs : Extract context from txt/html/docx/md/pdf chunk that context then search for relevant context with vector database integrations Transcribing audio from your microphone or file Crawling and scraping content from web pages Performance Kalosm uses the candle machine learning library to run models in pure rust. It supports quantized and accelerated models with performance on par with llama.cpp : Mistral 7b Accelerator Kalosm llama.cpp Metal (M2) 26 t/s 27 t/s Structured Generation Kalosm supports structured generation with a regex grammar. Because the grammar runs in rust code, it doesn't add any overhead to text generation. In fact, using a grammar can be even faster than uncontrolled text generation because Kalosm supports grammar acceleration! structured.webm In addition to regex, you can provide your own grammar to generate structured data. This lets you constrain the response to any structure you want including complex data structures like JSON, HTML, and XML. Kalosm Quickstart! This quickstart will get you up and running with a simple chatbot. Let's get started! A more complete guide for Kalosm is available on the Kalosm website , and examples are available in the examples folder . Install rust Create a new project: cargo new kalosm-hello-world cd ./kalosm-hello-world Add Kalosm as a dependency cargo add kalosm --git https://github.com/floneum/floneum --features language # You can use `--features language,metal`, `--features language,cublas`, or `--features language,mkl` if your machine supports an accelerator cargo add tokio --features full Add this code to your main.rs file use kalosm :: language :: * ; # [ tokio :: main ] async fn main ( ) { let model = Llama :: phi_3 ( ) . await . unwrap ( ) ; let mut chat = Chat :: builder ( model ) . with_system_prompt ( "You are a pirate called Blackbeard" ) . build ( ) ; loop { chat . add_message ( prompt_input ( " \n > " ) . unwrap ( ) ) . await . unwrap ( ) . to_std_out ( ) . await . unwrap ( ) ; } } Run your application with: cargo run --release hello-world.webm Community If you are interested in either project, you can join the discord to discuss the project and get help. Contributing Report issues on our issue tracker . Help other users in the discord If you are interested in contributing, feel free to reach out on discord