How to delete podman-ollama
Closed this issue · 30 comments
I've just installed podman-ollama to test things and I don't know how to uninstall it. I'm new to podman and containers. Can you please help me ? Thank you
If you use podman images
or sudo podman images
and delete the docker.io/ollama/ollama container image using podman rmi -f
that's pretty much it.
Was there something in podman-ollama that could be improved upon that caused you to delete? If there was I'd be interested in that feedback
That's one of the nice things about containers, you can cleanly delete everything with podman rmi
.
podman-ollama script will still be around, but I'm not 100% sure you want to delete that, it's tiny.
But if you really want to do this:
sudo rm -f $(command -v podman-ollama)
I'm trying to run ollama on my GPU, an RX 6700 XT, with Fedora Kinoite.
With ollama in a toolbox, I couldn't get it working. Same with ollama on the host and those packages installed on the host : rocm-clinfo rocm-hip rocm-opencl rocminfo. I wanted to see if podman-ollama could run ollama on my GPU, but it doesn't seem to works either.
How can I add environment variable to podman-ollama ?
When I was on Ubuntu I has to run : HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve because my gpu is not officially supported.
We will be able to fix this don't worry, I'll add a feature...
Fedora Kinoite is exactly the kind of OS I had in mind when creating this, it's exactly the OS I use.
Could you try again with this new option?
podman-ollama --hsa-override-gfx-version 10.3.0
Haven't tested but it should fix your use case, would need to install the new version
Assuming you installed these with rpm-ostree, and rebooted of course:
rocm-clinfo rocm-hip rocm-opencl rocminfo
I tested AMD GPU on Kinoite and it was fine, so it's probably just that env var
I get this error now :
maledict@fedora-1:/var/home/maledict$ podman-ollama --hsa-override-gfx-version 10.3.0
Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found
That's actually a good sign, it changed something... You could try --priviledged also
I really have to get eyeballs on this too:
https://github.com/ollama/ollama/pull/3615/files
it allows one to install on Kinoite without using podman if they want.
If you could help prod the maintainers on that PR I'd appreciate it.
Sometimes that's a good debug step, see if it works outside the container to ensure it's not the container getting in the way somehow.
Same problem : maledict@fedora-1:/var/home/maledict$ podman-ollama --hsa-override-gfx-version 10.3.0 --privileged
Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found
I will continue trying to fix it tomorrow or Thursday since I have a lot of work to do.
If you also add -g AMD, that might work
Isn't it possible to just get GPU acceleration in a toolbox container running the "normal" ollama and not a podman container ? I think it will be easier and I prefer using the official ollama.
If you also add -g AMD, that might work
Still the same error
@MaledictYtb this is using the official Ollama container image FWIW
You can try with toolbox, it could work, haven't tested.
Let me know regardless, I'd like to have a fix here for this also.
Another debug step you could try is:
rpm-ostree usroverlay
Standard Ollama install outside of all toolbox podman containers and debug.
This type of install won't persist reboot, but should rule out any container things getting in the way.
I do know AMD GPUs work with this in general though, it's what I have
It's interesting you knew toolbox but are new to podman and containers 😊
Since toolbox is just another form of podman and containers
@MaledictYtb this is using the official Ollama container image FWIW
You can try with toolbox, it could work, haven't tested.
Let me know regardless, I'd like to have a fix here for this also.
Another debug step you could try is:
rpm-ostree usroverlay
Standard Ollama install outside of all toolbox podman containers and debug.
This type of install won't persist reboot, but should rule out any container things getting in the way.
It's not working without any toolbox container, idk why. The only time I get it working was on Ubuntu with the drivers from AMD's website. Here's the log :
maledict@fedora-1:/var/home/maledict$ HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve
2024/05/22 08:46:45 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-22T08:46:45.974+02:00 level=INFO source=images.go:704 msg="total blobs: 24"
time=2024-05-22T08:46:45.975+02:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-22T08:46:45.975+02:00 level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)"
time=2024-05-22T08:46:45.975+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3788991662/runners
time=2024-05-22T08:46:47.907+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-05-22T08:46:47.911+02:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-22T08:46:47.913+02:00 level=WARN source=amd_linux.go:346 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2024-05-22T08:46:47.913+02:00 level=WARN source=amd_linux.go:278 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"
time=2024-05-22T08:46:47.913+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="31.2 GiB" available="10.1 GiB"
https://fedoraproject.org/wiki/SIGs/HC#Installation
Maybe try rocminfo and rocm-clinfo and paste the output here, it's weird I have almost an identical setup to you, except a different GPU and it's fine.
I've also been assuming you are on Fedora Kinoite 40 also FWIW.
Although it's not identical as I use the containerized version, podman-ollama, dunno about the bare-metal one.
There's other tools like nvtop, etc. that you can use to check your OS has detected your AMD GPU in general.
maledict@fedora-1:/var/home/maledict$ rocminfo
ROCk module is loaded
HSA System Attributes
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: AMD Ryzen 5 5600 6-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 5600 6-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3500
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32727904(0x1f36360) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32727904(0x1f36360) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32727904(0x1f36360) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6700 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 3072(0xc00) KB
L3: 98304(0x18000) KB
Chip ID: 29663(0x73df)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2725
BDFID: 2304
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 118
SDMA engine uCode:: 80
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1031
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
maledict@fedora-1:/var/home/maledict$ rocm-clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3602.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon RX 6700 XT
Device Topology: PCI[ B#9, D#0, F#0 ]
Max compute units: 20
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 2725Mhz
Address bits: 64
Max memory allocation: 10937905968
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 8192
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 12868124672
Constant buffer size: 10937905968
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2347971376
Max global variable size: 10937905968
Max global variable preferred total size: 12868124672
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7fb254c04808
Name: gfx1031
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3602.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
From what I can see it seems to detect my GPU.
I have this eroor when I install ollama on the system. Maybe it's why ollama can't use my GPU ?
maledict@fedora-1:/var/home/maledict$ curl -fsSL https://ollama.com/install.sh | sh
Downloading ollama...
######################################################################## 100.0%#=#=#
Installing ollama to /usr/local/bin...
Adding ollama user to render group...
Adding ollama user to video group...
Adding current user to ollama group...
Creating ollama systemd service...
Enabling and starting ollama service...
Downloading AMD GPU dependencies...
chmod: impossible d'accéder à '/usr/share/ollama': Aucun fichier ou dossier de ce nom
I have the same problem with the official docker image. Should I report the bug to the ollama github ?
Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found
@MaledictYtb I think you should, and we need this PR also, so I wouldn't be afraid to poke on this PR getting reviewed either:
@MaledictYtb and if there's any further patches we can get in here to help you like the:
--hsa-override-gfx-version
one, lets do it :)
I added that option as an autocomplete FWIW:
it can also be added to the configuration file also.
I GOT IT WORKING !!! I was having a lot of problems with basically everything, and I decided to install Podman Desktop to see better what's happening. I haven't deleted the volume ollama, and idk why it was causing a lot of problems. After deleting it and reinstalling the official docker image, I got ollama working with my gpu.
I've used this command to start the container and now it's working so fast : podman run -d -e HSA_OVERRIDE_GFX_VERSION="10.3.0" --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
Yeah the delete of the container image and volume makes sense, starting fresh again on updated images can make a difference, glad it's working.
Just to confirm this also means "podman-ollama" works also right?
If you did:
./podman-ollama --hsa-override-gfx-version 10.3.0
Sorry but I will not try podman-ollama since I don't want to delete my currently working setup and download all the models again to see if it's working.
podman-ollama uses the exact same volumes and container images but ok :)
Thanks for your feedback.