nVidia has CUDA, AMD has ROCm and then...if you're not lucky, you have nothing, for GPU accelerated LLM workloads? NO! Because Ollama now supports Vulkan! This means that if, like me, you have a machine that's not cutting edge, you can still leverage the power of the GPU to accelerate the models.
Disclaimer, it's NOT going to set the world on fire, but it is very usable with smaller models on my AMD Ryzen 5700U APU laptop.
Now, my laptop is an e-waste franken-top, a trash picked Lenovo Thinkbook 14 G3 ACL (that someone ruined by dropping beaten egg into it, that I cleaned up with an ultrasonic bath when I still did electronics) with a Ryzen 7 5700U (8c16t) with a Radeon Vega GPU, 16GB RAM and 1TB of NVMe SSD.
Originally, I set this laptop up with Ubuntu 25 for Project NOMAD, an offline media and information repository for basically everything. One of the core features was a built in Ollama front end, and locally hosted with a basic LLM model.
However, it was CPU only, as it didn't detect the GPU portion of the APU and basically only supports nVidia CUDA out of the box. It's also in a docker container so...yeah, it may be possible for me to tweak the included Ollama inside the docker container, but for now, I wanted to test if it worked with Vulkan at all.
I installed stock Ollama through terminal with:
Once installed, simply edited the Ollama service with:
And restarted the service with
I decided to use a small llama model, so fired up Ollama with
It's a 2GB model, and i was expecting the performance to be...weak. But, I was pleasantly surprised at how snappy it was, even though this is just a Zen2 APU with a pretty weak GPU core on it.
However, I have an aversion to doing things like this through the terminal interface and wanted to use a simple LLM interface through the browser. I chose Hollama due to the fact it was an easy install and didn't involve docker. I found a lot of these run in docker containers that never work, and just assume you're savvy enough with Linux to just make it work, always written from the perspective of the engineer, never the end user.
I literally just had to unzip the package and run
Before launching into anything else, I wanted to monitor the GPU. I tried several times to install
So with everything in place, I asked llama3.2 a question: "Explain the simplest method to purify unsanitary water, using only basic materials that you can scavenge."
After all, this is supposed to be a tool for if the SHTF...
Ta-da!
OK so the Graphics pipeline isn't being 100% saturated, but that result got spat out in about 3 seconds, tops, and kept going for about 15 more seconds before I had about two pages of information.
Seeing the result felt like this:

Not quite 88 tokens/second, but around 50 or so, compared to the CPU which managed 16.
Now go forth, and try this yourself, if you have something which has Vulkan support but no dGPU, or no official CUDA or ROCm support!
Disclaimer, it's NOT going to set the world on fire, but it is very usable with smaller models on my AMD Ryzen 5700U APU laptop.
Now, my laptop is an e-waste franken-top, a trash picked Lenovo Thinkbook 14 G3 ACL (that someone ruined by dropping beaten egg into it, that I cleaned up with an ultrasonic bath when I still did electronics) with a Ryzen 7 5700U (8c16t) with a Radeon Vega GPU, 16GB RAM and 1TB of NVMe SSD.
Originally, I set this laptop up with Ubuntu 25 for Project NOMAD, an offline media and information repository for basically everything. One of the core features was a built in Ollama front end, and locally hosted with a basic LLM model.
However, it was CPU only, as it didn't detect the GPU portion of the APU and basically only supports nVidia CUDA out of the box. It's also in a docker container so...yeah, it may be possible for me to tweak the included Ollama inside the docker container, but for now, I wanted to test if it worked with Vulkan at all.
I installed stock Ollama through terminal with:
curl -fsSL https://ollama.com/install.sh | shOnce installed, simply edited the Ollama service with:
sudo systemctl edit ollama and appending the following under the [Service] section:Environment="OLLAMA_HOST=http://0.0.0.0:11434"
Environment="OLLAMA_MODELS=/mnt/mydrive/ollama"
Environment="HIP_VISIBLE_DEVICES=-1"
Environment="OLLAMA_VULKAN=1"
Environment="GGML_VK_VISIBLE_DEVICES=0"
And restarted the service with
sudo systemctl restart ollama.I decided to use a small llama model, so fired up Ollama with
ollama run llama3.2It's a 2GB model, and i was expecting the performance to be...weak. But, I was pleasantly surprised at how snappy it was, even though this is just a Zen2 APU with a pretty weak GPU core on it.
However, I have an aversion to doing things like this through the terminal interface and wanted to use a simple LLM interface through the browser. I chose Hollama due to the fact it was an easy install and didn't involve docker. I found a lot of these run in docker containers that never work, and just assume you're savvy enough with Linux to just make it work, always written from the perspective of the engineer, never the end user.
I literally just had to unzip the package and run
./hollama to bring up a chrome-sandbox.Before launching into anything else, I wanted to monitor the GPU. I tried several times to install
amdgpu_top but the compile always, always fails, so I defaulted to radeontop instead, to monitor if the GPU was actually being used. Bit more basic, but did the job.So with everything in place, I asked llama3.2 a question: "Explain the simplest method to purify unsanitary water, using only basic materials that you can scavenge."
After all, this is supposed to be a tool for if the SHTF...
Ta-da!
OK so the Graphics pipeline isn't being 100% saturated, but that result got spat out in about 3 seconds, tops, and kept going for about 15 more seconds before I had about two pages of information.
Seeing the result felt like this:
Not quite 88 tokens/second, but around 50 or so, compared to the CPU which managed 16.
Now go forth, and try this yourself, if you have something which has Vulkan support but no dGPU, or no official CUDA or ROCm support!