/Spin-O-Llama

Ollama api implementation for spin

Primary LanguageRustMIT LicenseMIT

Spin-O-Llama

build

Ollama api implementation for spin

⚠️ Proof of concept: This project is not production ready

Quick Start

  • Install spin
  • login to fermeyon cloud
    spin login
    
  • clone this repository
    git clone https://github.com/BLaZeKiLL/Spin-O-Llama.git
    cd Spin-O-Llama
    
  • build
    spin build
    
  • deploy
    spin deploy
    

Routes implemented

  • POST /api/generate

    supported request body

    {
        "model": "<supported-model>",
        "prompt": "<input prompt>",
        "system": "<system prompt>", // optional, system prompt
        "stream": false, // streaming not supported, has no impact
        "options": { // optional, llm options
            "num_predict": 128,
            "temperature": 0.8,
            "top_p": 0.9,
            "repeat_penalty": 1.1
        } // default values provided above
    }

    response body

    {
        "model": "<model-id>",
        "response": "<output>",
        "done": true
    }
  • POST /api/embeddings

    supported request body

    {
        "model": "<model-id>", // doesn't matter for now will always use all-minilm-l6-v2
        "prompt": "<input>"
    }

    response body

    {
        "embedding": [<float array>]
    }

Model compatibility

  • generate - llama2-chat, codellama-instruct
  • embeddings - all-minilm-l6-v2

Contributing

Contributions are welcome for further implementation of the Ollama api that is supported on the spin runtime.