How to delete podman-ollama

Question

How to delete podman-ollama

Closed this issue 5 months ago · 30 comments

I've just installed podman-ollama to test things and I don't know how to uninstall it. I'm new to podman and containers. Can you please help me ? Thank you

Answer 1 · 2024-05-21T19:33:32.000Z

If you use podman images or sudo podman images and delete the docker.io/ollama/ollama container image using podman rmi -f that's pretty much it.

Was there something in podman-ollama that could be improved upon that caused you to delete? If there was I'd be interested in that feedback

Answer 2 · 2024-05-21T19:36:40.000Z

That's one of the nice things about containers, you can cleanly delete everything with podman rmi.

podman-ollama script will still be around, but I'm not 100% sure you want to delete that, it's tiny.

But if you really want to do this:

sudo rm -f $(command -v podman-ollama)

Answer 3 · 2024-05-21T19:57:05.000Z

I'm trying to run ollama on my GPU, an RX 6700 XT, with Fedora Kinoite.
With ollama in a toolbox, I couldn't get it working. Same with ollama on the host and those packages installed on the host : rocm-clinfo rocm-hip rocm-opencl rocminfo. I wanted to see if podman-ollama could run ollama on my GPU, but it doesn't seem to works either.
How can I add environment variable to podman-ollama ?
When I was on Ubuntu I has to run : HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve because my gpu is not officially supported.

Answer 4 · 2024-05-21T20:18:33.000Z

We will be able to fix this don't worry, I'll add a feature...

Fedora Kinoite is exactly the kind of OS I had in mind when creating this, it's exactly the OS I use.

Answer 5 · 2024-05-21T20:35:25.000Z

Could you try again with this new option?

ee95554

podman-ollama --hsa-override-gfx-version 10.3.0

Haven't tested but it should fix your use case, would need to install the new version

Answer 6 · 2024-05-21T20:41:32.000Z

Assuming you installed these with rpm-ostree, and rebooted of course:

rocm-clinfo rocm-hip rocm-opencl rocminfo

Answer 7 · 2024-05-21T20:42:09.000Z

I tested AMD GPU on Kinoite and it was fine, so it's probably just that env var

Answer 8 · 2024-05-21T20:48:33.000Z

I get this error now :
maledict@fedora-1:/var/home/maledict$ podman-ollama --hsa-override-gfx-version 10.3.0

Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found

Answer 9 · 2024-05-21T20:52:11.000Z

That's actually a good sign, it changed something... You could try --priviledged also

Answer 10 · 2024-05-21T20:54:55.000Z

I really have to get eyeballs on this too:

https://github.com/ollama/ollama/pull/3615/files

it allows one to install on Kinoite without using podman if they want.

If you could help prod the maintainers on that PR I'd appreciate it.

Sometimes that's a good debug step, see if it works outside the container to ensure it's not the container getting in the way somehow.

Answer 11 · 2024-05-21T20:56:04.000Z

Same problem : maledict@fedora-1:/var/home/maledict$ podman-ollama --hsa-override-gfx-version 10.3.0 --privileged
Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found

I will continue trying to fix it tomorrow or Thursday since I have a lot of work to do.

Answer 12 · 2024-05-21T20:59:52.000Z

If you also add -g AMD, that might work

Answer 13 · 2024-05-21T21:01:28.000Z

Isn't it possible to just get GPU acceleration in a toolbox container running the "normal" ollama and not a podman container ? I think it will be easier and I prefer using the official ollama.

Answer 14 · 2024-05-21T21:01:40.000Z

If you also add -g AMD, that might work

Still the same error

Answer 15 · 2024-05-21T21:05:56.000Z

@MaledictYtb this is using the official Ollama container image FWIW

You can try with toolbox, it could work, haven't tested.

Let me know regardless, I'd like to have a fix here for this also.

Another debug step you could try is:

rpm-ostree usroverlay

Standard Ollama install outside of all toolbox podman containers and debug.

This type of install won't persist reboot, but should rule out any container things getting in the way.

Answer 16 · 2024-05-21T21:12:33.000Z

I do know AMD GPUs work with this in general though, it's what I have

Answer 17 · 2024-05-21T21:25:45.000Z

It's interesting you knew toolbox but are new to podman and containers 😊

Since toolbox is just another form of podman and containers

Answer 18 · 2024-05-22T06:49:33.000Z

@MaledictYtb this is using the official Ollama container image FWIW

You can try with toolbox, it could work, haven't tested.

Let me know regardless, I'd like to have a fix here for this also.

Another debug step you could try is:

rpm-ostree usroverlay

Standard Ollama install outside of all toolbox podman containers and debug.

This type of install won't persist reboot, but should rule out any container things getting in the way.

It's not working without any toolbox container, idk why. The only time I get it working was on Ubuntu with the drivers from AMD's website. Here's the log :
maledict@fedora-1:/var/home/maledict$ HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve
2024/05/22 08:46:45 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-22T08:46:45.974+02:00 level=INFO source=images.go:704 msg="total blobs: 24"
time=2024-05-22T08:46:45.975+02:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-22T08:46:45.975+02:00 level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)"
time=2024-05-22T08:46:45.975+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3788991662/runners
time=2024-05-22T08:46:47.907+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-05-22T08:46:47.911+02:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-22T08:46:47.913+02:00 level=WARN source=amd_linux.go:346 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2024-05-22T08:46:47.913+02:00 level=WARN source=amd_linux.go:278 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"
time=2024-05-22T08:46:47.913+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="31.2 GiB" available="10.1 GiB"

Answer 19 · 2024-05-22T07:44:19.000Z

https://fedoraproject.org/wiki/SIGs/HC#Installation

Maybe try rocminfo and rocm-clinfo and paste the output here, it's weird I have almost an identical setup to you, except a different GPU and it's fine.

I've also been assuming you are on Fedora Kinoite 40 also FWIW.

Although it's not identical as I use the containerized version, podman-ollama, dunno about the bare-metal one.

Answer 20 · 2024-05-22T11:31:14.000Z

There's other tools like nvtop, etc. that you can use to check your OS has detected your AMD GPU in general.

Answer 21 · 2024-05-22T11:54:33.000Z

maledict@fedora-1:/var/home/maledict$ rocminfo
ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents

Agent 1

Name: AMD Ryzen 5 5600 6-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 5600 6-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3500
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32727904(0x1f36360) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32727904(0x1f36360) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32727904(0x1f36360) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:

Agent 2

Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6700 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 3072(0xc00) KB
L3: 98304(0x18000) KB
Chip ID: 29663(0x73df)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2725
BDFID: 2304
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 118
SDMA engine uCode:: 80
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1031
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

maledict@fedora-1:/var/home/maledict$ rocm-clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3602.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback

Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon RX 6700 XT
Device Topology: PCI[ B#9, D#0, F#0 ]
Max compute units: 20
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 2725Mhz
Address bits: 64
Max memory allocation: 10937905968
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 8192
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 12868124672
Constant buffer size: 10937905968
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2347971376
Max global variable size: 10937905968
Max global variable preferred total size: 12868124672
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7fb254c04808
Name: gfx1031
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3602.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

From what I can see it seems to detect my GPU.

Answer 22 · 2024-05-22T12:52:42.000Z

I have this eroor when I install ollama on the system. Maybe it's why ollama can't use my GPU ?

maledict@fedora-1:/var/home/maledict$ curl -fsSL https://ollama.com/install.sh | sh

Downloading ollama...
######################################################################## 100.0%#=#=#
Installing ollama to /usr/local/bin...
Adding ollama user to render group...
Adding ollama user to video group...
Adding current user to ollama group...
Creating ollama systemd service...
Enabling and starting ollama service...
Downloading AMD GPU dependencies...
chmod: impossible d'accéder à '/usr/share/ollama': Aucun fichier ou dossier de ce nom

Answer 23 · 2024-05-22T13:14:17.000Z

I have the same problem with the official docker image. Should I report the bug to the ollama github ?

Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found

Answer 24 · 2024-05-22T15:31:14.000Z

@MaledictYtb I think you should, and we need this PR also, so I wouldn't be afraid to poke on this PR getting reviewed either:

https://github.com/ollama/ollama/pull/3615/files

Answer 25 · 2024-05-22T16:53:02.000Z

@MaledictYtb and if there's any further patches we can get in here to help you like the:

--hsa-override-gfx-version

one, lets do it :)

Answer 26 · 2024-05-22T16:55:37.000Z

I added that option as an autocomplete FWIW:

8d54d54

it can also be added to the configuration file also.

Answer 27 · 2024-05-23T11:31:47.000Z

I GOT IT WORKING !!! I was having a lot of problems with basically everything, and I decided to install Podman Desktop to see better what's happening. I haven't deleted the volume ollama, and idk why it was causing a lot of problems. After deleting it and reinstalling the official docker image, I got ollama working with my gpu.
I've used this command to start the container and now it's working so fast : podman run -d -e HSA_OVERRIDE_GFX_VERSION="10.3.0" --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

Answer 28 · 2024-05-23T12:09:20.000Z

Yeah the delete of the container image and volume makes sense, starting fresh again on updated images can make a difference, glad it's working.

Just to confirm this also means "podman-ollama" works also right?

If you did:

./podman-ollama --hsa-override-gfx-version 10.3.0

Answer 29 · 2024-05-23T13:04:13.000Z

Sorry but I will not try podman-ollama since I don't want to delete my currently working setup and download all the models again to see if it's working.

Answer 30 · 2024-05-23T13:11:40.000Z

podman-ollama uses the exact same volumes and container images but ok :)

Thanks for your feedback.

maledict@fedora-1:/var/home/maledict$ rocminfo ROCk module is loaded

HSA System Attributes

========== HSA Agents

maledict@fedora-1:/var/home/maledict$ rocminfo
ROCk module is loaded

==========
HSA Agents