Ollama customization running in GPUs

AI hands-on tutorials5mos agorelease AI Sharing Circle

1.4K 00

Windows (computer)

The following is an example of how to customize Ollama to run in the GPU on a Windows system.

Ollama By default the CPU is used for inference. For faster inference, you can configure the GPU used by Ollama.This tutorial will guide you on how to set an environment variable on your Windows system to enable GPU acceleration.

pre-conditions

The computer has an NVIDIA graphics card.
NVIDIA graphics drivers are installed and can be used with the command nvidia-smi to check if the driver is installed.
The CUDA toolkit is installed and can be used with the command nvcc --version to check if CUDA is installed.

Tip.

You can search for tutorials on installing NVIDIA drivers and CUDA kits, so I won't repeat them in this article. If your computer meets the above prerequisites, Ollama is GPU-accelerated by default. If you want to specify a particular GPU, you can follow the steps below to set it up.

Configuring Environment Variables

Open the system environment variable settings
- Type "Environment Variables" in the Windows search bar and select "Edit System Environment Variables".
- In the "System Properties" pop-up window, click the "Advanced" tab, and then click the "Environment Variables" button.
Creating the OLLAMA_GPU_LAYER variable
- In the "System Variables" area, click the "New" button.
- In the New System Variable dialog box, enter the following information:
  - Variable Name: OLLAMA_GPU_LAYER
  - Variable values: cuda (This will tell Ollama to use CUDA for GPU acceleration)
- Click "OK" to save the variable.
(Optional) Specifies the GPU to be used.
- If your system has multiple GPUs and you want to specify that Ollama uses a specific GPU, you can set the CUDA_VISIBLE_DEVICES Environment variables.
- Finds the UUID of the GPU: It is strongly recommended to use the UUID instead of the number, as the number may change due to driver updates or system reboots.
  - Open a command prompt or PowerShell.
  - Run command:nvidia-smi -L
  - In the output, find the "UUID" value of the GPU you want to use. Example:GPU 00000000:01:00.0 lower UUID : GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxThe
- Creates the CUDA_VISIBLE_DEVICES variable:
  - In the "System Variables" area, click the "New" button.
  - In the New System Variable dialog box, enter the following information:
    - Variable Name: CUDA_VISIBLE_DEVICES
    - Variable values: The UUID of the found GPU. for example:GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  - Click "OK" to save the variable.

Important: In order to make the environment variable effective, the Restarting a terminal or application on which Ollama is running The

Verify that GPU acceleration is in effect:

Open a command prompt.
Run Ollama. for example: ollama run deepseek-r1:1.5b
Open a new command prompt window and use the ollama ps command to view the processes running in Ollama.

Linux

The following is an example of how to customize Ollama to run on the GPU on a Linux system.

establish ollama_gpu_selector.sh script file with the following contents:

#!/bin/bash

# Validate input
validate_input(){
if[[! $1 =~^[0-4](,[0-4])*$ ]];then
echo "Error: Invalid input. Please enter numbers between 0 and 4, separated by commas."
exit1
fi
}

# Update the service file with CUDA_VISIBLE_DEVICES values
update_service(){
# Check if CUDA_VISIBLE_DEVICES environment variable exists in the service file
if grep -q '^Environment="CUDA_VISIBLE_DEVICES='/etc/systemd/system/ollama.service;then
# Update the existing CUDA_VISIBLE_DEVICES values
sudo sed -i 's/^Environment="CUDA_VISIBLE_DEVICES=.*/Environment="CUDA_VISIBLE_DEVICES='"$1"'"/'/etc/systemd/system/ollama.service
else
# Add a new CUDA_VISIBLE_DEVICES environment variable
sudo sed -i '/\[Service\]/a Environment="CUDA_VISIBLE_DEVICES='"$1"'"'/etc/systemd/system/ollama.service
fi

# Reload and restart the systemd service
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
echo "Service updated and restarted with CUDA_VISIBLE_DEVICES=$1"
}

# Check if arguments are passed
if["$#"-eq 0];then
# Prompt user for CUDA_VISIBLE_DEVICES values if no arguments are passed
read -p "Enter CUDA_VISIBLE_DEVICES values (0-4, comma-separated): " cuda_values
validate_input "$cuda_values"
update_service "$cuda_values"
else
# Use arguments as CUDA_VISIBLE_DEVICES values
cuda_values="$1"
validate_input "$cuda_values"
update_service "$cuda_values"
fi

Adding Execute Permissions to Script Files

chmod +x ollama_gpu_selector.sh
sudo ./ollama_gpu_selector.sh

After running the script, follow the prompts to enter the GPU number to specify the GPU used by Ollama. you can use commas to separate multiple GPU numbers, for example:0,1,2The