/semantickernel-localLLMs

Sample on how to run a LLM using LM Studio and interact with the model using Semantic Kernel.

Primary LanguageC#MIT LicenseMIT

Using Semantic Kernel with local LLMs

Running a local webserver with LMStudio to expose Phi-2 model

License: MIT Twitter: elbruno GitHub: elbruno

✨ This is a quickstart for sample to show how to run a SLM (small language model: Phi-2) in local mode with LMStudioAI, and how to interact with the model using SemanticKernel.

Getting started - Quick guide

  1. 🌐 Start Local Inference Server: Open LM Studio and start the webserver with your favourite LLM.

  2. 📤 One-click setup: Open a new Codespace, giving you a fully configured cloud developer environment.

  3. 💬 Change your chat questions: Update the chat code in src/sk-phi2-localserver-lmstudio/Program.cs.

  4. ▶️ Run, one-click again: Use VS Code's built-in Run command. Check LM Studio logs and app logs to see the model running.

  5. 🔄 Iterate quickly: Codespaces updates the server on each save, and VS Code's debugger lets you dig into the code execution.

Configure your environment

Before you get started, make sure you have the following requirements in place:

Getting Started with LM Studio

LM Studio is a desktop application that allows you to run open-source models locally on your computer. You can use LM Studio to discover, download, and chat with models from Hugging Face, or create your own custom models. LM Studio also lets you run a local inference server that mimics the OpenAI API, so you can use any model with your favorite tools and frameworks. LM Studio is available for Mac, Windows, and Linux, and you can download it from their website.

Search for models in LM Studio

Download models locally and run a local inference server with LM Studio

Here are the steps to run a local server with LM Studio

  1. Launch LM Studio and search for a LLM from Hugging Face using the search bar. You can filter the models by compatibility, popularity, or quantization level. For this demo we will use Phi-2.

  2. Select a model and click Download. You can also view the model card for more information about the model.

  3. Once the model is downloaded, go to the Local Server section and select the model from the drop-down menu. You can also adjust the server settings and parameters as you wish.

  4. Click Start Server to run the model on your local machine. You will see a URL that you can use to access the server from your browser or other applications.

    Important: The server is compatible with the OpenAI API, so you can use the same code and format for your requests and responses.

  5. To stop the server, click Stop Server. You can also delete the model from your machine if you don’t need it anymore.

Local Inference Server running in LM Studio

Phi-2

Phi-2 is a small language model (SLM) developed by Microsoft Research that has 2.7 billion parameters and demonstrates outstanding reasoning and language understanding capabilities. It was trained on a mix of synthetic and web datasets for natural language processing and coding. It achieves state-of-the-art performance among base language models with less than 13 billion parameters and matches or outperforms models up to 25x larger on complex benchmarks. We can use Phi-2 to generate text, code, or chat with it using the Azure AI Studio or the Hugging Face platform⁴. 😊

Here are some additional resources related to Phi-2:

Run Local

  1. Start the LM Studio Local Inference Server running with Phi-2.

  2. Open src/sk-phi2-localserver-lmstudio/Program.cs.

    Press [F5] To Start Debugging. Choose your prefered Debugger.

  3. Once the project is compiled, the app should be running.

    Check the logs to see the chat interaction. You can also check LM Studio logs to validate the LLM model outpus. Run simple demo

Run in Codespaces

  1. Click here to open in GitHub Codespaces

    Open in GitHub Codespaces

  2. This action may take a couple of minutes. Once the Codespaces is initialized, check the Extensions tab and check that all extensions are installed.

  3. The file src/sk-phi2-localserver-lmstudio/Program.cs should be open. If not, open it using the Explorer option from the Right Sidebar.

  4. Using the the Run and Debug option, run the program. Select "C# as the run option".

  5. Run the app and check the CodeSpaces terminal and the LM Studio logs.

Advanced chat demo.

Update the file src/sk-phi2-localserver-lmstudio/Program.cs with the following code. This will run a small interactive chat using Phi-2 as the backend model.

// init chat
var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddSystemMessage("You are a useful assistant that replies using a funny style.You answer with short messages. Your name is Goku.");
Console.WriteLine("Hint: type your question or type 'exit' to leave the conversation");

// chat loop
while (true)
{
    Console.Write("You: ");
    var input = Console.ReadLine();
    if (string.IsNullOrEmpty(input) || input.ToLower() == "exit")
        break;
    history.AddUserMessage(input);
    history = (ChatHistory) await chat.GetChatMessageContentsAsync(history);
    Console.WriteLine(history[^1].Content);
    Console.WriteLine("---");
}

Console.WriteLine("Goodbye!");

The running app should be similar to this:

Chat complete demo

Trouble shooting

  1. Important*: If your codespaces can't access the localhost endpoint, you may get an error similar to this one.

    Codespaces can't access localhost error

    In order to solve this problem, you can use the Codespaces Network Bridge.

    The following command will connect the codespaces with your local machine ports:

    gh net start --your codespace--

Author

👤 Bruno Capuano

🤝 Contributing

Contributions, issues and feature requests are welcome!

Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

📝 License

Copyright © 2024 Bruno Capuano.

This project is MIT licensed.