anansi is a fully featured content vectorization system aimed at providing the latest advances in embedding generation, in-domain tuning and vector storage in an easy to use package.
- Rust implementation of FreshDiskANN with support for scalar quantization
- Configurable RocksDB based storage engine
- ONNX runtime support for CUDA accelerated embedding models
- Build indices on unstructured data without worrying about whether or not it is text, image or video
- Support for gRPC and HTTP clients
- Single installation binary that can cross-compile to non-Linux targets
- Utilize cutting-edge embeddings models that are listed on the MTEB Leaderboard
- Bin-pack model inference on the CPU or GPU, supporting request batching with little effort
- Fine tune embedding generation with in-domain samples
docker pull infrawhispers/anansi:latest
docker run --name anansi -it -p 50051:50051 -p 50052:50052 -v /.cache:/app/.cache infrawhispers/anansi:latest
[1] standalone embedding generation using INSTRUCTOR
curl \
-X POST http://172.17.0.1:50052/encode \
-H 'Content-Type: application/json' \
-d '{
"batches":[{
"model_name":"INSTRUCTOR_LARGE",
"model_class":"ModelClass_INSTRUCTOR",
"text":{
"data": [
{
"instruction": "Represent the Science title:",
"value": "3D ActionSLAM: wearable person tracking ..."
},
{
"instruction": "Represent the Nature title:",
"value": "Inside Gohar World and the Fine, Fantastical Art"
}
]
}
}]}
'
We use docusaurus to generate our documenation, please either refer to the READMEs here or check out the documentation website.
anansi (/əˈnɑːnsi/ ə-NAHN-see; literally translates to spider) is an Akan folktale character and god of stories, wisdom and knowledge. We thought it was an apt name as we aim to provide ML applications with turn-key memory and persistence.
Hop onto Discord via this invite link or shoot an email to infrawhispers@proton.me
We welcome contributions of all sizes and contributors at all levels! Please take a look at open issues or look at #contributions in the Discord.