How to: Use Ollama with unsupported GPU

  • Board Nominations
    Nominations have now closed and the results are available here.
  • Hey Guest, MARCHintosh 2026 is upon us. Check out community projects, join GlobalTalk, and have fun!

Kai Robinson

TinkerDifferent Board President 2023
Staff member
Founder
Sep 2, 2021
1,347
1
1,352
113
43
Worthing, UK
nVidia has CUDA, AMD has ROCm and then...if you're not lucky, you have nothing, for GPU accelerated LLM workloads? NO! Because Ollama now supports Vulkan! This means that if, like me, you have a machine that's not cutting edge, you can still leverage the power of the GPU to accelerate the models.

Disclaimer, it's NOT going to set the world on fire, but it is very usable with smaller models on my AMD Ryzen 5700U APU laptop.

Now, my laptop is an e-waste franken-top, a trash picked Lenovo Thinkbook 14 G3 ACL (that someone ruined by dropping beaten egg into it, that I cleaned up with an ultrasonic bath when I still did electronics) with a Ryzen 7 5700U (8c16t) with a Radeon Vega GPU, 16GB RAM and 1TB of NVMe SSD.

Originally, I set this laptop up with Ubuntu 25 for Project NOMAD, an offline media and information repository for basically everything. One of the core features was a built in Ollama front end, and locally hosted with a basic LLM model.

However, it was CPU only, as it didn't detect the GPU portion of the APU and basically only supports nVidia CUDA out of the box. It's also in a docker container so...yeah, it may be possible for me to tweak the included Ollama inside the docker container, but for now, I wanted to test if it worked with Vulkan at all.

I installed stock Ollama through terminal with: curl -fsSL https://ollama.com/install.sh | sh

Once installed, simply edited the Ollama service with: sudo systemctl edit ollama and appending the following under the [Service] section:

Environment="OLLAMA_HOST=http://0.0.0.0:11434"
Environment="OLLAMA_MODELS=/mnt/mydrive/ollama"
Environment="HIP_VISIBLE_DEVICES=-1"
Environment="OLLAMA_VULKAN=1"
Environment="GGML_VK_VISIBLE_DEVICES=0"

And restarted the service with sudo systemctl restart ollama.

I decided to use a small llama model, so fired up Ollama with ollama run llama3.2

It's a 2GB model, and i was expecting the performance to be...weak. But, I was pleasantly surprised at how snappy it was, even though this is just a Zen2 APU with a pretty weak GPU core on it.

However, I have an aversion to doing things like this through the terminal interface and wanted to use a simple LLM interface through the browser. I chose Hollama due to the fact it was an easy install and didn't involve docker. I found a lot of these run in docker containers that never work, and just assume you're savvy enough with Linux to just make it work, always written from the perspective of the engineer, never the end user.

I literally just had to unzip the package and run ./hollama to bring up a chrome-sandbox.

Before launching into anything else, I wanted to monitor the GPU. I tried several times to install amdgpu_top but the compile always, always fails, so I defaulted to radeontop instead, to monitor if the GPU was actually being used. Bit more basic, but did the job.

So with everything in place, I asked llama3.2 a question: "Explain the simplest method to purify unsanitary water, using only basic materials that you can scavenge."

After all, this is supposed to be a tool for if the SHTF... ;)

Ta-da!

Screenshot_From_2026-03-27_01-26-29.png


OK so the Graphics pipeline isn't being 100% saturated, but that result got spat out in about 3 seconds, tops, and kept going for about 15 more seconds before I had about two pages of information.

Seeing the result felt like this:

doc-brown-it-works.gif

:ROFLMAO:

Not quite 88 tokens/second, but around 50 or so, compared to the CPU which managed 16.

Now go forth, and try this yourself, if you have something which has Vulkan support but no dGPU, or no official CUDA or ROCm support!
 

Sander

New Tinkerer
Mar 27, 2026
1
1
3
Hi Kai!
I've just registered on this site after a search-engine gave your post as a result.
The last couple of days i also tried to run an LLM using my mini PC with AMD Ryzen 7 4800h (Renoir) APU. It's comparable to yours. I was unable to get it to run using the CPU, but as you've managed to do it. I'm giving it another try.

I think Ollama is indeed using my GPU. I did:
Code:
systemctl edit ollama --full
And put this in the service section:

Code:
[Service]
Type=exec
ExecStart=/usr/local/bin/ollama serve
Environment=HOME=/root
Environment=OLLAMA_INTEL_GPU=false
Environment=OLLAMA_HOST=0.0.0.0
Environment=OLLAMA_NUM_GPU=999
Environment=SYCL_CACHE_PERSISTENT=1
Environment=ZES_ENABLE_SYSMAN=1

Environment=HIP_VISIBLE_DEVICES=-1
Environment=OLLAMA_VULKAN=1
Environment=GGML_VK_VISIBLE_DEVICES=0

Restart=always
RestartSec=3

After doing
Code:
systemctl restart ollame
i then check with
Code:
journalctl -ef
and saw this:
Code:
Listening on [::]:11434 (version 0.18.3)
discovering available GPUs...
user overrode visible devices" HIP_VISIBLE_DEVICES=-1
user overrode visible devices" GGML_VK_VISIBLE_DEVICES=0
if GPUs are not correctly discovered, unset and try again"
starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-engine --port 36723"
starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-engine --port 45507"
starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-engine --port 37915"
inference compute" id=00000000-0400-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon Graphics (RADV RENOIR)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:04:00.0 type=iGPU total="27.6 GiB" available="25.8 GiB"
vram-based default context" total_vram="27.6 GiB" default_num_ctx=32768
My previous attemt indicated that it was only using the CPU. Now it's actually using the GPU via Vulkan.
And indeed it's much...much... much faster compared to CPU.

Thanks you so much! I finally managed to get things working.
BTW. The APU should be able to use your regular memory also as VRAM. I've configured it in the BIOS to 8gb. But i also read that memory would automatically be claimed by the GPU, the log seems to confirm that.
I just tried
Code:
qwen3.5:9b
which also runs!
1774641111072.png
 
Last edited:
  • Like
Reactions: Kai Robinson

Kai Robinson

TinkerDifferent Board President 2023
Staff member
Founder
Sep 2, 2021
1,347
1
1,352
113
43
Worthing, UK
Unfortunately, with only 16GB RAM in the system and no way to expand it (single SO-DIMM DDR4 slot, 8GB soldered to the board), the BIOS won't allow more than 2GB allocated as VRAM.