Step by step instructions to install and run the Llama Stack on Linux and Mac
jeffxtang opened this issue · 14 comments
I managed to make the Llama Stack server and client work with Ollama on both EC2 (with 24GB GPU) and Mac (tested on 2021 M1 and 2019 2.4GHz i9 MBP, both with 32GB memory). Steps are below:
- Open one Terminal, go to your work directory, then:
git clone https://github.com/meta-llama/llama-agentic-system
cd llama-agentic-system
conda create -n llama-stack python=3.10
conda activate llama-stack
pip install -r requirements.txt
- If you're on Linux, run:
curl -fsSL https://ollama.com/install.sh | sh
Otherwise, download the Ollama zip for Mac here, unzip it and double click the Ollama.app to move it to the Applications folder.
- On the same Terminal, run:
ollama pull llama3.1:8b-instruct-fp16
to download the Llama 3.1 8B model and then run:
ollama run llama3.1:8b-instruct-fp16
to confirm it works by entering some question and expecting Llama's answer.
- Now run the command below to install Llama Stack's Ollama distribution:
llama distribution install --spec local-ollama --name ollama
You should see (and hit enter to accept default settings for Configuring..., except n & n for the two questions related to llama_guard_shield & prompt_guard_shield):
Successfully setup distribution environment. Configuring...
Configuring API surface: inference
Enter value for url (default: http://localhost:11434):Configuring API surface: safety
Do you want to configure llama_guard_shield? (y/n): n
Do you want to configure prompt_guard_shield? (y/n): nConfiguring API surface: agentic_system
YAML configuration has been written to /Users/<your_name>/.llama/distributions/ollama/config.yaml
Distributionollama
(with spec local-ollama) has been installed successfully!
- Launch the ollama distribution by running:
llama distribution start --name ollama --port 5000
- Finally on another Terminal, go to the
llama-agentic-system
folder, then:
conda activate ollama
and either (on Mac)
python examples/scripts/vacation.py localhost 5000 --disable_safety
or (on Linux)
python examples/scripts/vacation.py [::] 5000 --disable_safety
You should see output starting with (Note: If you start the script right after Step 5, especially on a slower machine such as 2019 Mac with 2.4GHz i9, you may see "httpcore.ReadTimeout" because the Llama model is still being loaded; wait a moment and retry (a few times) should work):
User> I am planning a trip to Switzerland, what are the top 3 places to visit?
StepType.inference> Switzerland is a beautiful country with a rich history, stunning landscapes, and vibrant culture. Here are three top places to visit in Switzerland:
- Jungfraujoch: Also known as the "Top of Europe," Jungfraujoch is the highest train station in Europe, located at an altitude of 3,454 meters (11,332 feet) above sea level. It offers breathtaking views of the surrounding mountains and glaciers, including the iconic Eiger, Mönch, and Jungfrau peaks.
and on the first Terminal that runs llama distribution start --name ollama --port 5000
, you should see:
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
Environment: ipython
Tools: brave_search, wolfram_alpha, photogenCutting Knowledge Date: December 2023
Today Date: 09 August 2024INFO: ::1:50987 - "POST /agentic_system/create HTTP/1.1" 200 OK
INFO: ::1:50988 - "POST /agentic_system/session/create HTTP/1.1" 200 OK
INFO: ::1:50989 - "POST /agentic_system/turn/create HTTP/1.1" 200 OK
role='user' content='I am planning a trip to Switzerland, what are the top 3 places to visit?'
Pulling model: llama3.1:8b-instruct-fp16
Assistant: Switzerland is a beautiful country with a rich history, stunning landscapes, and vibrant culture. Here are three top places to visit in Switzerland:
- Jungfraujoch: Also known as the "Top of Europe," Jungfraujoch is a mountain peak located in the Bernese Alps. It's the highest train station in Europe, offering breathtaking views of the surrounding mountains, glaciers, and valleys. You can take a ride on the Jungfrau Railway, which takes you to the summit, where you can enjoy stunning vistas, visit the Ice Palace, and even ski or snowboard in the winter.
Bonus: To see the tool calling (see here and here for more info) in action, try the hello.py
example, which asks Llama "Which players played in the winning team of the NBA western conference semifinals of 2024, please use tools" whose answer needs a web search tool, followed by a prompt "Hello". On Mac, run (replace localhost
with [::]
on Linux):
python examples/scripts/hello.py localhost 5000 --disable_safety
And you should see the output returning "BuiltinTool.brave_search" below (if you see "httpcore.ReadTimeout", retry should work):
User> Hello
StepType.inference> Hello! How can I assist you today?
User> Which players played in the winning team of the NBA western conference semifinals of 2024, please use tools
StepType.inference> brave_search.call(query="NBA Western Conference Semifinals 2024 winning team players")
StepType.tool_execution> Tool:BuiltinTool.brave_search Args:{'query': 'NBA Western Conference Semifinals 2024 winning team players'}
StepType.tool_execution> Tool:BuiltinTool.brave_search Response:{"query": null, "top_k": []}
StepType.shield_call> No Violation
StepType.inference> I need to search for information about the 2024 NBA Western Conference Semifinals.
If you delete "please use tools" in the prompt of hello.py
, not wanting to beg, you'll likely see the output:
I'm not able to provide real-time information. However, I can suggest some possible sources where you may be able to find the information you are looking for.
By setting an appropriate system prompt, or switching to a bigger sized Llama 3.1 model - details coming soon - you'd see you don't have to be too polite to make Llama comfortable but yourself not.
Your error message says "Conda environment 'ollama' exists". Did you run Step 4 more than once? What does "conda env list|grep ollama" show? Can you try "llama distribution install --spec local-ollama --name ollama2" assuming "ollama2" doesn't exist then use "ollama2" instead of "ollama" in Steps 5 and 6.
I see PS1: unbound variable
(install_distribution.sh
sets -e
), so I suspect that there's an issue with the prompt when the script attempts to activate the environment. @amkoupaei, are you able to create/use other conda environments successfully? Also, any reason you need to run as root?
Noted - thank you.
I can create other conda envs successfully. Also no need for root; I just tried that route for debugging this issue. Running as non-root has the same issue
@amkoupaei Dont have hands on an unbuntu machine to try this right now but some early debugging seems like if we update line 111 in install_distribution.sh
to
python_interp=$(conda run --no-capture-output -n "$env_name" which python)
This might fix the issue for you. Can you give this a try and see if this fixes it for you ?
Unfortunately, it did not work either.
I also tried this on a fresh Ubuntu EC2 instance; still the same issue
I just tried on a fresh EC2 too and it worked for me - the complete log of "llama distribution" is here. What's your log or diff look like? @amkoupaei
Really odd. Can you run conda run -n agentic_env which python
in your shell and paste what it outputs? Does it succeed?
I simplified a bit: meta-llama/llama-stack@0d933ac
Can you see if this helps?
yes, that succeeds - giving the location of the python installation.
I might consider an alternative path and use the already deployed models on cloud.
Thank you all for your help/support.
@hardikjshah @dltn we need to host these instructions (these are great!) somewhere in our READMEs or instructions for Ollama. What would be the right place?
Running the command :
llama distribution install --spec local-ollama --name ollama
Getting this output:
usage: llama [-h] {download,model,stack} ...
llama: error: argument {download,model,stack}: invalid choice: 'distribution' (choose from 'download', 'model', 'stack')
I'm new here trying to run llama using Mac.
distribution doesn't seem to be an argument in llama.
Help would be appreciated.