How to: Use Ollama with unsupported GPU

Kai Robinson · Mar 27, 2026

nVidia has CUDA, AMD has ROCm and then...if you're not lucky, you have nothing, for GPU accelerated LLM workloads? NO! Because Ollama now supports Vulkan! This means that if, like me, you have a machine that's not cutting edge, you can still leverage the power of the GPU to accelerate the models.

Disclaimer, it's NOT going to set the world on fire, but it is very usable with smaller models on my AMD Ryzen 5700U APU laptop.

Now, my laptop is an e-waste franken-top, a trash picked Lenovo Thinkbook 14 G3 ACL (that someone ruined by dropping beaten egg into it, that I cleaned up with an ultrasonic bath when I still did electronics) with a Ryzen 7 5700U (8c16t) with a Radeon Vega GPU, 16GB RAM and 1TB of NVMe SSD.

Originally, I set this laptop up with Ubuntu 25 for Project NOMAD, an offline media and information repository for basically everything. One of the core features was a built in Ollama front end, and locally hosted with a basic LLM model.

However, it was CPU only, as it didn't detect the GPU portion of the APU and basically only supports nVidia CUDA out of the box. It's also in a docker container so...yeah, it may be possible for me to tweak the included Ollama inside the docker container, but for now, I wanted to test if it worked with Vulkan at all.

I installed stock Ollama through terminal with: curl -fsSL https://ollama.com/install.sh | sh

Once installed, simply edited the Ollama service with: sudo systemctl edit ollama and appending the following under the [Service] section:

Environment="OLLAMA_HOST=http://0.0.0.0:11434"
Environment="OLLAMA_MODELS=/mnt/mydrive/ollama"
Environment="HIP_VISIBLE_DEVICES=-1"
Environment="OLLAMA_VULKAN=1"
Environment="GGML_VK_VISIBLE_DEVICES=0"

And restarted the service with sudo systemctl restart ollama.

I decided to use a small llama model, so fired up Ollama with ollama run llama3.2

It's a 2GB model, and i was expecting the performance to be...weak. But, I was pleasantly surprised at how snappy it was, even though this is just a Zen2 APU with a pretty weak GPU core on it.

However, I have an aversion to doing things like this through the terminal interface and wanted to use a simple LLM interface through the browser. I chose Hollama due to the fact it was an easy install and didn't involve docker. I found a lot of these run in docker containers that never work, and just assume you're savvy enough with Linux to just make it work, always written from the perspective of the engineer, never the end user.

I literally just had to unzip the package and run ./hollama to bring up a chrome-sandbox.

Before launching into anything else, I wanted to monitor the GPU. I tried several times to install amdgpu_top but the compile always, always fails, so I defaulted to radeontop instead, to monitor if the GPU was actually being used. Bit more basic, but did the job.

So with everything in place, I asked llama3.2 a question: "Explain the simplest method to purify unsanitary water, using only basic materials that you can scavenge."

After all, this is supposed to be a tool for if the SHTF...

Ta-da!

OK so the Graphics pipeline isn't being 100% saturated, but that result got spat out in about 3 seconds, tops, and kept going for about 15 more seconds before I had about two pages of information.

Seeing the result felt like this:

Not quite 88 tokens/second, but around 50 or so, compared to the CPU which managed 16.

Now go forth, and try this yourself, if you have something which has Vulkan support but no dGPU, or no official CUDA or ROCm support!

Sander · Mar 27, 2026

Hi Kai!
I've just registered on this site after a search-engine gave your post as a result.
The last couple of days i also tried to run an LLM using my mini PC with AMD Ryzen 7 4800h (Renoir) APU. It's comparable to yours. I was unable to get it to run using the CPU, but as you've managed to do it. I'm giving it another try.

I think Ollama is indeed using my GPU. I did:

Code:

systemctl edit ollama --full

And put this in the service section:

Code:

[Service]
Type=exec
ExecStart=/usr/local/bin/ollama serve
Environment=HOME=/root
Environment=OLLAMA_INTEL_GPU=false
Environment=OLLAMA_HOST=0.0.0.0
Environment=OLLAMA_NUM_GPU=999
Environment=SYCL_CACHE_PERSISTENT=1
Environment=ZES_ENABLE_SYSMAN=1

Environment=HIP_VISIBLE_DEVICES=-1
Environment=OLLAMA_VULKAN=1
Environment=GGML_VK_VISIBLE_DEVICES=0

Restart=always
RestartSec=3

After doing

Code:

systemctl restart ollame

i then check with

Code:

journalctl -ef

and saw this:

Code:

Listening on [::]:11434 (version 0.18.3)
discovering available GPUs...
user overrode visible devices" HIP_VISIBLE_DEVICES=-1
user overrode visible devices" GGML_VK_VISIBLE_DEVICES=0
if GPUs are not correctly discovered, unset and try again"
starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-engine --port 36723"
starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-engine --port 45507"
starting runner" cmd="/usr/local/lib/ollama/bin/ollama runner --ollama-engine --port 37915"
inference compute" id=00000000-0400-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon Graphics (RADV RENOIR)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:04:00.0 type=iGPU total="27.6 GiB" available="25.8 GiB"
vram-based default context" total_vram="27.6 GiB" default_num_ctx=32768

My previous attemt indicated that it was only using the CPU. Now it's actually using the GPU via Vulkan.
And indeed it's much...much... much faster compared to CPU.

Thanks you so much! I finally managed to get things working.
BTW. The APU should be able to use your regular memory also as VRAM. I've configured it in the BIOS to 8gb. But i also read that memory would automatically be claimed by the GPU, the log seems to confirm that.
I just tried

Code:

qwen3.5:9b

which also runs!

Kai Robinson · Mar 27, 2026

Unfortunately, with only 16GB RAM in the system and no way to expand it (single SO-DIMM DDR4 slot, 8GB soldered to the board), the BIOS won't allow more than 2GB allocated as VRAM.

beerbaron105 · Apr 9, 2026

Kai Robinson said:
nVidia has CUDA, AMD has ROCm and then...if you're not lucky, you have nothing, for GPU accelerated LLM workloads? NO! Because Ollama now supports Vulkan! This means that if, like me, you have a machine that's not cutting edge, you can still leverage the power of the GPU to accelerate the models.

Disclaimer, it's NOT going to set the world on fire, but it is very usable with smaller models on my AMD Ryzen 5700U APU laptop.

Now, my laptop is an e-waste franken-top, a trash picked Lenovo Thinkbook 14 G3 ACL (that someone ruined by dropping beaten egg into it, that I cleaned up with an ultrasonic bath when I still did electronics) with a Ryzen 7 5700U (8c16t) with a Radeon Vega GPU, 16GB RAM and 1TB of NVMe SSD.

Originally, I set this laptop up with Ubuntu 25 for Project NOMAD, an offline media and information repository for basically everything. One of the core features was a built in Ollama front end, and locally hosted with a basic LLM model.

However, it was CPU only, as it didn't detect the GPU portion of the APU and basically only supports nVidia CUDA out of the box. It's also in a docker container so...yeah, it may be possible for me to tweak the included Ollama inside the docker container, but for now, I wanted to test if it worked with Vulkan at all.

I installed stock Ollama through terminal with: curl -fsSL https://ollama.com/install.sh | sh

Once installed, simply edited the Ollama service with: sudo systemctl edit ollama and appending the following under the [Service] section:

And restarted the service with sudo systemctl restart ollama.

I decided to use a small llama model, so fired up Ollama with ollama run llama3.2

It's a 2GB model, and i was expecting the performance to be...weak. But, I was pleasantly surprised at how snappy it was, even though this is just a Zen2 APU with a pretty weak GPU core on it.

However, I have an aversion to doing things like this through the terminal interface and wanted to use a simple LLM interface through the browser. I chose Hollama due to the fact it was an easy install and didn't involve docker. I found a lot of these run in docker containers that never work, and just assume you're savvy enough with Linux to just make it work, always written from the perspective of the engineer, never the end user.

I literally just had to unzip the package and run ./hollama to bring up a chrome-sandbox.

Before launching into anything else, I wanted to monitor the GPU. I tried several times to install amdgpu_top but the compile always, always fails, so I defaulted to radeontop instead, to monitor if the GPU was actually being used. Bit more basic, but did the job.

So with everything in place, I asked llama3.2 a question: "Explain the simplest method to purify unsanitary water, using only basic materials that you can scavenge."

After all, this is supposed to be a tool for if the SHTF...

Ta-da!

OK so the Graphics pipeline isn't being 100% saturated, but that result got spat out in about 3 seconds, tops, and kept going for about 15 more seconds before I had about two pages of information.

Seeing the result felt like this:

Not quite 88 tokens/second, but around 50 or so, compared to the CPU which managed 16.

Now go forth, and try this yourself, if you have something which has Vulkan support but no dGPU, or no official CUDA or ROCm support!

Stumbled upon this post googling some stuff related to project nomad

I have a budget optiplex 7090 and threw in a cheap radeon wx3200 -- was disapointed to see that AMD isn't supported in projectnomad and the ai would be cpu based only. Really wanted that project nomad system plus starting to learn about homelabbing.

I am also fairly new to linux, I was looking at your terminal instructions, do those apply to the project nomad program or that is completely separate, if so, did you ever tinker with the project nomad ai yet?

If I do want to try a separate llama model, where exactly am i injecting those instructions you wrote? Am i replacing the "services" code, or just adding to it? Sorry if these are noob questions, trying to learn.

Thank you!!!!!

Kai Robinson · Apr 14, 2026

beerbaron105 said:
Stumbled upon this post googling some stuff related to project nomad

I have a budget optiplex 7090 and threw in a cheap radeon wx3200 -- was disapointed to see that AMD isn't supported in projectnomad and the ai would be cpu based only. Really wanted that project nomad system plus starting to learn about homelabbing.

I am also fairly new to linux, I was looking at your terminal instructions, do those apply to the project nomad program or that is completely separate, if so, did you ever tinker with the project nomad ai yet?

If I do want to try a separate llama model, where exactly am i injecting those instructions you wrote? Am i replacing the "services" code, or just adding to it? Sorry if these are noob questions, trying to learn.

Thank you!!!!!

Hi! Welcome to TinkerDifferent

I haven't attempted to make any changes to the AI in Project NOMAD, i just wanted to install OLLAMA separately in Linux on the Laptop, to see if it was possible.

As for the systemctl commands in the first post, you're adding them (appending the existing stuff).

Hope that helps.

nevarDeath · Jun 10, 2026

Kai Robinson said:
Unfortunately, with only 16GB RAM in the system and no way to expand it (single SO-DIMM DDR4 slot, 8GB soldered to the board), the BIOS won't allow more than 2GB allocated as VRAM.

First off, THANK YOU for tinkering with an old crusty potato and posting about it!
I also have a ThinkPad E14 with 16GB of RAM (8GB soldered x 8GB added), same 5700U APU. It was used almost daily from 2021-2025 and I know it updated the BIOS multiple times during that time.
In my BIOS, I was able to set 4GB of VRAM via `UEFI > Config > Display > "UMA Frame buffer size"`
I suspect your BIOS could be updated to let you set 4GB like mine. My BIOS version is R1OET42W (1.21) and the BIOS date is 12-16-2024

My BIOS has options for Auto, 1G, 2G, and 4G
Auto, ollama shows vram=8.2GiB
1G, ollama shows vram=8.2GiB
2G, ollama shows vram=8.7GiB
4G, ollama shows vram=9.7GiB
My Kernel is 7.0.11-arch-1, btw.

So this setting does matter on this laptop. Ollama will not just 'find' the max. That said, I did find people on reddit using this tool: https://github.com/DavidS95/Smokeless_UMAF which lets you change VRAM and even memory clock speeds outside of the BIOS. I haven't taken time to understand how it works, so I have not yet tried it.

Kai Robinson · Jun 10, 2026

nevarDeath said:
First off, THANK YOU for tinkering with an old crusty potato and posting about it!
I also have a ThinkPad E14 with 16GB of RAM (8GB soldered x 8GB added), same 5700U APU. It was used almost daily from 2021-2025 and I know it updated the BIOS multiple times during that time.
In my BIOS, I was able to set 4GB of VRAM via `UEFI > Config > Display > "UMA Frame buffer size"`
I suspect your BIOS could be updated to let you set 4GB like mine. My BIOS version is R1OET42W (1.21) and the BIOS date is 12-16-2024

My BIOS has options for Auto, 1G, 2G, and 4G
Auto, ollama shows vram=8.2GiB
1G, ollama shows vram=8.2GiB
2G, ollama shows vram=8.7GiB
4G, ollama shows vram=9.7GiB
My Kernel is 7.0.11-arch-1, btw.

So this setting does matter on this laptop. Ollama will not just 'find' the max. That said, I did find people on reddit using this tool: https://github.com/DavidS95/Smokeless_UMAF which lets you change VRAM and even memory clock speeds outside of the BIOS. I haven't taken time to understand how it works, so I have not yet tried it.

Unfortunately, i'm running the latest BIOS and it still only shows 2GB that i can allocate, as max. Probably because this is the thinkbook 'office' line, not the slightly more professional 'E' series thinkpads.

Search

How to: Use Ollama with unsupported GPU

Kai Robinson

Co-Founder, Emeritus Board Member

Sander

New Tinkerer

Kai Robinson

Co-Founder, Emeritus Board Member

beerbaron105

New Tinkerer

Kai Robinson

Co-Founder, Emeritus Board Member

nevarDeath

New Tinkerer

Kai Robinson

Co-Founder, Emeritus Board Member