This project provides a lightweight setup to run a local agent using the Llama3.2 3B model via Ollama, integrated with custom MCP servers and function tools and an interactive interface through Streamlit.
- Run LLM inference locally using Ollama
- Custom MCP server (e.g., Google Drive) and tool integration
- Interactive front-end with Streamlit
Follow instructions from the Ollama website to install Ollama for your system.
Then download and run the model:
ollama pull llama3.2:3b
ollama run llama3.2:3b --keepalive 60mCreate a virtual environment and install uv:
python3 -m venv agent_venv
source agent_venv/bin/activate
pip install uv
pip install -r requirements.txt📄 requirements.txt contains all necessary Python dependencies.
To start the Llama Stack (based on the Ollama and venv template):
INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --runThis runs a development LLM server using the specified model. For other methods, refer here.
Follow the Getting Started and Authentication instructions in the official GDrive MCP repository.
Once authentication is complete, start the custom MCP server:
python3 gdrive_tools.pyCreate your own custom tool functions and register them with the agent. Refer to the example provided in api/summariser_custom_tool.py.
Use the following command to start the web UI:
streamlit run streamlit_app.pyThis provides a user-friendly interface to interact with the agent locally.
- Add additional tools for your use case.
- Try complex use cases using bigger models
- Integrate memory and shields.
- Deploy with containerization or cloud runtimes.