Building a Local Deepseek AI Inference Server
First the good news! Digital Spaceport got some great performance with the AMD EPYC Rome platform that they used for the previous review :😁: This setup is a classic! Those of you who are using this setup have good news today, as they're running 4.25 to 3.5 pcs/sec with the full Q4 671b model! Token (TPS). This is important because those "lite" models are not in the same league at all. They don't perform anywhere near as well, and other models easily outperform them. To get a really big model experience, use a full model, preferably with a large context window (16K+). This full model is OK even if you run it on CPU only, so you can leave it running while running some smaller models, like image recognition models, on the GPU. Again, you can't run the full version on GPU RAM alone unless you have a super fancy rig! Deepseek Digital Spaceport will teach you all sorts of tricks to get it running. It's not exactly "easy", but it's fun if you like to fiddle with technology.
Corrigendum (2024/02/01)
- Idle power consumption: 60W (lower than Digital Spaceport expected, and that's without the GPU plugged in)
- Full load power consumption: 260W
- Digital Spaceport Current Memory Frequency: 2400MHz (3200MHz may be better)
Local AI Server CPU Hardware
If you've seen Digital Spaceport's quad 3090 graphics server configuration guide before, you're in luck. That EPYC 7702 CPU can still be beat today.Digital Spaceport recommends a better CPU, as the price is now about the same and the performance increase is significant. However, the results in this article were run with Digital Spaceport's own 7702 CPU, and the MZ32-AR0 motherboard was a good recommendation back in the day, as it has 16 memory slots that run at the full 3200MHz frequency, which will help you cut down on the cost of getting between 512GB and 1TB of memory. Digital Spaceport is using 2400MHz DDR4 memory sticks, but if you use 3200MHz DDR4 ECC memory sticks, you should be able to increase the performance. 16 32GB sticks will give you 512GB of memory, and 16 64GB sticks will give you 1TB.Note: LRDIMM and RDIMM memory sticks cannot be mixed! (LRDIMM and RDIMM are two different types of server memory sticks and should not be mixed, or the computer may not boot up!)
List of Local AI Server Rig Components
- Racks $55
- MZ32-AR0 Motherboard $500
- CPU water cooled 420mm Corsair h170i elite capellix xt $170
- EPYC CPU Water Cooler Header Bracket
- 64-core AMD EPYC 7702 $650 or 64-core AMD EPYC 7V13 $599 or 64-core AMD EPYC 7C13 $735
- 512GB 2400 ECC Memory $400
- 1TB NVMe SSD - Samsung 980 Pro $75
- 850W power supply $80 (If you're only using CPU reasoning, 850W is enough. For GPU use, it is recommended to get a 1500W or 1600W power supply to start with)
(Prices are as of January 29, 2025)
Total cost: approximately $2,000* If you are using 512GB of 2400 RAM and an EPYC 7702 CPU, Digital Spaceport recommends getting a 7C13 or 7V13 CPU instead of upgrading the memory frequency. Upgrading to 768GB of RAM is the second choice, with 3200MHz RAM being the last to be considered. With the top-of-the-line CPU (7C13 or 7V13) and 1TB of 2400MHz RAM, the total cost would be about $2500. *Â Â
Rig Rack Assembly
The assembly process is the same as the previous video, still without the GPU and extension card. If you want to add a GPU later, Digital Spaceport recommends getting a 1500W or 1600W power supply from the start. After adding the GPU and extension card, the rest of the case remains the same. You can watch this video to learn how to assemble it. (This refers to the quad 3090 graphics server configuration video mentioned above.) Ignore the part about the GPU in the video, the rest of the steps are the same.
Also, it's a good idea to get a wall of small fans, tied with zip ties, blowing air at the memory sticks to help dissipate the heat. Memory sticks don't thermally fuse, but getting too hot can trigger overheating protection, reducing performance and affecting data processing speeds.Digital Spaceport uses four small 80mm fans. (Fan wall refers to multiple fans mounted side by side to form a wall of air, providing greater cooling power)
Motherboard Upgrade Tips
If you want to use AMD EPYC 7V13 CPU, you'd better buy the V3 version of MZ32-AR0 motherboard directly instead of buying V1 and then upgrade it. V1 version may not support Milan architecture CPUs from the factory, and you need to use V3 version, so you may need to upgrade the motherboard BIOS with V2 version of CPUs first.BIOS is equivalent to the "soul" of a computer motherboard, controlling hardware startup and operation. Upgrading the BIOS allows the motherboard to support newer hardware) Digital Spaceport can't say for sure if the V1 version doesn't support Milan CPUs, but Digital Spaceport thinks it's very likely. According to Digital Spaceport's experience, you can upgrade the V1 motherboard to the latest V3 version by flashing the BIOS update. You can do this by first flashing an earlier V3 BIOS and then flashing the latest V3 BIOS. The latest BIOS version is M23_R40 (Data as of this writing).
Local AI self-hosted software setup
This part of the software setup is slightly more complicated than Digital Spaceport's previous tutorials. Yes, you can install Ollama directly on a barebones Proxmox system.Proxmox is a server virtualization management software that allows you to run multiple virtual machines on a single physical serverHowever, Digital Spaceport recommends against it. Now you have two options, and Digital Spaceport will start with one of them; Digital Spaceport needs to test the performance impact before deciding whether to recommend the other option. The other option is to run Ollama in a standalone LXC container or virtual machine (VM).LXC containers and VMs are both virtualization technologies that isolate the environment in which software is running) If you've seen Digital Spaceport's LXC and Docker tutorials before, you can go ahead and use LXC, but Digital Spaceport suggests that it's best to install it in a virtual machine (VM) for now. Digital Spaceport will try to work out a more unified solution to make everything happily self-sufficient in our little AI server, but it will take time.
Bare metal Ubuntu 24.04 or Proxmox VM?
If you want to minimize the unnecessary hassle of a fresh install from scratch, then just install it on a bare-metal Ubuntu 24.04 server. Alternatively, you can refer to the previous Proxmox tutorial. (Refers to Digital Spaceport's previously published Proxmox installation tutorial.) It's up to you to decide what you want to do, at your own peril. You can install a desktop environment if you like, but it's not necessary, and Digital Spaceport doesn't demonstrate it. We are running a service on a server, so don't be afraid of the command line interface (CLI).
Setting the BMC of the MZ32-AR0 Main Board
Connect the Ethernet and BMC ports of the MZ32-AR0 motherboard to the network cable. (BMC (Baseboard Management Controller) Baseboard Management Controller that can remotely manage server hardware) If you are using a firewall router, such as opnsense or pfsense, you can check the router's ARP list to see if the BMC's port is listed. Find the IP address of the BMC. For Digital Spaceport, the BMC address is https://192.168.1.XX. Open this address in your browser, and a username and password login box will pop up. The default username is admin, and the password is on the sticker on the motherboard, under the MZ32-AR0 logo. The sticker on the Digital Spaceport motherboard is shown in the picture. It's the sticker with the barcode. The password is probably the initial password by removing the first 3 characters of "3/C/" and then the next 6 or 11 characters. After successfully logging into the BMC management interface, go to
Home > Settings > Network > Network IP Settings
Set the static IP address of the motherboard. If you are using local DNS and NTP servers, set them up as well. (DNS servers for domain name resolution and NTP servers for time synchronizationThe BMC management interface will be used a lot in the future, so it's a good idea to bookmark it.
Next, click on "remote control" in the sidebar. The page will display the "HTML5 viewer" option. The page will show the "HTML5 viewer" option. Digital Spaceport recommends using a wired connection, as you will be uploading a 2.5GB ISO image of your Ubuntu 24.04 system over the network to install the system later. (ISO image files are full backups of CDs or hard disks) Go to the official Ubuntu website to download ISO image for Ubuntu 24.04 Server versionThe It's about 2.5GB in size. In the HTML5 viewer of the BMC admin interface, click on the top right corner and load this ISO image.
When loading is complete, tap "Start (start)". If you haven't turned on the server yet, do so now. Once powered on, the HTML5 viewer interface will begin to display the upload progress, and the numbers will slowly increase. Mouse click into the "screen" window and wait for the Gigabyte boot лого to appear. Once you see the лого, press the DEL key on your keyboard to enter BIOS setup. In the BIOS, load defaults, then save and reboot. After the computer has rebooted, enter the BIOS again and this time change some settings. First, set the boot disk. The boot mode can be either UEFI or Legacy, which is probably less troublesome and less likely to cause problems. (UEFI and Legacy are two different BIOS boot modes, with UEFI being the more modern and advanced.)
Below are the BIOS setup items that need to be found and modified:
- NPS is set to 1 (NPS (Nodes Per Socket) Number of nodes per socket, affects CPU memory access mode)
- CCD set to Auto (CCD (Core Complex Die) CPU Core Complex, the modular design of AMD's CPUs.)
- SMT Off (SMT (Simultaneous Multithreading), AMD's version of Hyper-Threading, is turned off for more stable single-core performance.)
- SVM off (can be turned on if using Proxmox VM, performance will be slightly degraded) (SVM (Secure Virtual Machine) Secure Virtual Machine mode, virtualization technology hardware acceleration)
- IOMMU off (can be turned on if using Proxmox VM, performance will be slightly degraded) (IOMMU (Input-Output Memory Management Unit) Input-Output Memory Management Unit, hardware accelerated by virtualization technology)
- cTDP adjusted to 200W (for EPYC 7702 CPU) (cTDP (Configurable TDP) Configurable TDP to adjust CPU power and performance)
- Set the deterministic control to manual and pull the slider to performance.
- quick power policy set to performance
- BoostFMax is set to manual (BoostFMax CPU Acceleration Frequency Limit Setting)
- boostFmax is set to 3400 (for EPYC 7702 CPU) (Set the CPU acceleration frequency limit to 3.4GHz.)
After modifying the above BIOS settings, save and reboot again. During this reboot, press F12/11/10 (Digital Spaceport can't remember which key, it's shown at the bottom of the boot лого page) to enter the boot menu. Select the "AMI Virtual CD" option to boot from the virtual CD-ROM drive. If you have a wired Internet connection, you should be able to access the Ubuntu installation screen in no time. The next step is to install Ubuntu. Set a username and password, and make sure to memorize them. Check the "setup ssh server" option to facilitate remote login after installation. (SSH (Secure Shell) Secure Shell Protocol for remote login and management of servers) Wait for the system installation to complete. The system will reboot after the installation is complete, and you will be prompted to press Enter to continue. After pressing Enter, the system will reboot again, and then enter the command line terminal and be prompted for a user name. Enter the user name and password you set to log in. After successfully logging in, enter the command
ip a
Remember the IP address displayed. Now you can go back to your Windows/macOS/Linux computer and close the HTML5 viewer window in the BMC administration interface. In your computer's Terminal software, type (put) username
Replace it with your username, the ipaddress
Replace it with the IP address you just memorized).
ssh username@ipaddress
Installation of commonly used software
Here's a big list of commands, just copy and paste them into the terminal and run them. This omits the GPU related steps, if you have an NVIDIA graphics card, you can install the drivers after completing the following steps.
sudo apt update && sudo apt upgrade -y && sudo apt install -y htop git glances nano lsof unzip
sudo apt update
: Update the list of software sources to ensure that the latest packages can be downloaded.sudo apt upgrade -y
: Upgrade all installed packages to the latest version.-y
parameter indicates that all operations are confirmed automatically.sudo apt install -y htop git glances nano lsof unzip
: Install some common Linux software.htop
: A more user-friendly process manager that allows you to view system resource usage in real time.git
: Code versioning tool for downloading and managing code.glances
: A more powerful system monitoring tool than thehtop
Richer functionality.nano
: Easy-to-use text editor for editing configuration files.lsof
: A tool for viewing open files that can be used to troubleshoot problems such as port occupancy.unzip
: A tool to decompress ZIP files.
Setting a Static IP Address
Type in the command line terminal:
sudo netplan generate
This command generates a default network configuration file. It then generates a default network configuration file with the nano
The editor opens the configuration file:
sudo nano /etc/netplan/50-cloud-init.yaml
The contents of the default configuration file look roughly like this. We need to modify the eno1
configuration of the network card.eno1
Corresponds to the physical network port on the motherboard. Ignore the enp65s0np0
That's an external network card.
Use the keyboard arrow keys to move the cursor to change the configuration file to something like the following. use your current IP address for the IP address, to make it easier to operate and to avoid complicating things. digital spaceport here set the static IP to 192.168.1.200
The router gateway is 192.168.1.1
, which is a very common home network configuration.
When the modification is complete, press Ctrl+X
abort nano
editor, when prompted to save press Y
key to confirm.
When you return to the terminal, enter the command to apply the new network configuration:
sudo netplan apply
Now your server is set up with a static IP address. You can reboot the server and log in remotely using SSH to make sure the static IP settings are in effect.
Install Ollama
next install Ollama Large model running framework.
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
: Download the Ollama installation package.curl
is a command-line download tool.-L
parameter indicates a follow redirect link.-o
parameter specifies the name of the file to save.sudo tar -C /usr -xzf ollama-linux-amd64.tgz
: Extract the Ollama installation package to/usr
Catalog.tar
is a popular compression/decompression tool for Linux systems.-C /usr
Specify the decompression directory as/usr
(math.) genus-xzf
parameter means to decompress the gzip-compressed tar file.sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
: Create a file namedollama
system user for running the Ollama service.sudo useradd
: Command to create a user.-r
: To create a system user, the UID and GID are automatically assigned.-s /bin/false
: Prohibit users from logging into the system.-U
: Create a user group with the same name.-m
: Automatically creates the user home directory.-d /usr/share/ollama
: Specify the user's home directory as/usr/share/ollama
Theollama
: Username.
sudo usermod -a -G ollama $(whoami)
: Add the current user to theollama
User Groups.sudo usermod
: Command to modify user information.-a
: Add to user group instead of override.-G ollama
: Add toollama
User Groups.$ (whoami)
: Get the current user name.
This command also creates a file named ollama
user and install Ollama to the /usr/share/ollama
directory. By default, model files are placed in the /usr/share/ollama/.ollama/models/
Catalog.
Configuring environment variables and services
Now it is necessary to set some environment variables that will be used when Ollama starts. This is critical for solving parallel processing problems.
sudo nano /etc/systemd/system/ollama.service
expense or outlay nano
The editor opens Ollama's systemd service configuration file. We need to add environment variables to the configuration file. Below is a list of all the environment variables that can be configured, we don't need to use all of them, they are just listed for reference.
Environment variables:
-OLLAMA_DEBUG
: Show more debugging information (e.g.OLLAMA_DEBUG=1
)-OLLAMA_HOST
: IP address on which the Ollama server listens (default)127.0.0.1:11434
)-OLLAMA_KEEP_ALIVE
: how long the model stays loaded in memory (default"5m"
(5 minutes)-OLLAMA_MAX_LOADED_MODELS
: Maximum number of models allowed to be loaded per GPU-OLLAMA_MAX_QUEUE
: Maximum length of the request queue-OLLAMA_MODELS
: Directory where the model files are stored-OLLAMA_NUM_PARALLEL
: Maximum number of parallel requests-OLLAMA_NOPRUNE
: No model cache cleanup on startup-OLLAMA_ORIGINS
: a comma-separated list of sources that allow cross-domain requests-OLLAMA_SCHED_SPREAD
: Whether to distribute the model evenly across all GPUs-OLLAMA_FLASH_ATTENTION
: Whether Flash Attention acceleration is enabled (Flash Attention is an optimization Transformer Techniques for modeling computational efficiency)-OLLAMA_KV_CACHE_TYPE
: Quantization type of K/V cache (default)f16
) (The K/V cache is a key component of the Transformer model used to accelerate inference, and quantization reduces the video memory footprint but may lose precision)-OLLAMA_LLM_LIBRARY
: Specify LLM libraries to bypass auto-detection (LLM libraries are the underlying computational libraries used to run large models, such asllama.cpp
,exllama
et al. (and other authors))-OLLAMA_GPU_OVERHEAD
: Memory space reserved per GPU (bytes)-OLLAMA_LOAD_TIMEOUT
: Model load timeout (default)"5m"
(5 minutes)
The contents of Digital Spaceport's configuration file are as follows. Note: Unless you have a GPU card, you don't need (and shouldn't) fill in GPU-related environment variables.
check or refer to Ctrl+X
and then press Y
Save the configuration file. Then run the following command:
sudo systemctl daemon-reload
sudo systemctl start ollama
nproc
sudo systemctl daemon-reload
: Reload the systemd service configuration file to make the changes take effect.sudo systemctl start ollama
: Startupollama.service
service, the Ollama Large Modeling Framework.nproc
: View the number of CPU cores.
Now the environment variables are configured. nproc
The command should output the number 64
, which means 64 CPU cores. If the output 128
This means that SMT hyperthreading has not been turned off, so you need to turn it off in the BIOS. If the output 32
maybe 96
If the outputs are not in the same state as the outputs, check the NPS and CCD settings in the BIOS. If the output 64
If the CPU core number is recognized, it means that the CPU core number is recognized correctly and you can proceed to the next step (LFG = Let's Fucking Go).
Download Deepseek 671b model
Now download the Deepseek 671b large model. This model takes up about 400GB of disk space, so hopefully your NVMe SSD is big enough.
ollama pull deepseek-r1:671b
ollama pull deepseek-r1:671b
: Download using the Ollama Clientdeepseek-r1:671b
Model.pull
command to download the model.deepseek-r1:671b
is the model name and version.
The download process will be slow, so be patient. Lament the staggering data traffic costs, by the way... (Downloading large models consumes a lot of network traffic, especially from foreign servers, which may incur high traffic costs)
Installing OpenWEBUI
We need to use Docker or Python to run OpenWEBUI. Here Digital Spaceport uses Docker to deploy it. Make sure you don't have any other Docker-related software installed on your system to avoid conflicts.
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
: Cycle through uninstalling Docker-related packages that may be installed to avoid conflicts.for pkg in ... ; do ... ; done
: A for loop statement that iterates through the list of packages.docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc
: A list of Docker-related packages that may be installed.sudo apt-get remove $pkg
: Uninstall the package.apt-get remove
is the command to uninstall packages on Debian/Ubuntu systems.$pkg
is a loop variable representing the name of the package currently being traversed.
Install the official Docker sources:
# Add Docker's official GPG key.
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources.
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" \ \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
- This command is used to add the official Docker repositories to facilitate the installation of the latest version of Docker.
- The first part adds the official Docker GPG key, which is used to verify the integrity and origin of the package.
- The second part adds Docker repositories to the APT (Advanced Package Tool) repositories list.
- ultimate
sudo apt-get update
Update the list of software sources to enable the newly added Docker sources.
Install the Docker Engine:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
: Install the Docker Engine and related components.docker-ce
: Docker Community Edition engine.docker-ce-cli
: Docker command line client.containerd.io
: Docker's underlying dependencies at container runtime.docker-buildx-plugin
: Docker Buildx plugin for building multi-architecture Docker images.docker-compose-plugin
: Docker Compose plugin for managing multi-container Docker applications.-y
: Confirms all operations automatically.
Install the Dockge Docker container manager.
Dockge's data directory defaults to the /opt/dockge
directory, Docker-related data is also in the /opt
The catalog is easy to manage.
sudo mkdir -p /opt/stacks /opt/dockge
cd /opt/dockge
sudo curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml -output compose.yaml
docker compose up -d
sudo mkdir -p /opt/stacks /opt/dockge
: Create a catalog/opt/stacks
cap (a poem)/opt/dockge
It is used to store Dockge-related files.-p
parameter indicates that if the parent directory does not exist, it will be created as well.cd /opt/dockge
: Switch the current working directory to/opt/dockge
Thesudo curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml -output compose.yaml
: Download Dockge'scompose.yaml
Documentation.compose.yaml
is a configuration file for Docker Compose that defines the configuration of the Docker container.docker compose up -d
: Start the Dockge container using Docker Compose.up
command to start the container.-d
parameter indicates background operation.
You can now access the Dockge administration interface via a browser to complete the remaining Docker container management steps. If you don't know the IP address of the server, you can check the server's network settings. The browser access address is http://服务器IP:5001
The IP address of Digital Spaceport's servers is the same as that of Digital Spaceport. For example, Digital Spaceport's server IP is 192.168.1.200
Then the access address is http://192.168.1.200:5001
. The first time you visit you will need to set up a username and password, so be sure to remember them. The next step is to create the OpenwebUI Docker containers now.
Paste the following into Dockge's Compose editor as the Compose configuration for OpenwebUI:
version: "3.3"
services.
open-webui.
services: open-webui.
- 7000:8080
volumes: open-webui:/app/backend/data
- open-webui:/app/backend/data
container_name: open-webui
restart: always
image: ghcr.io/open-webui/open-webui:latest
volumes: open-webui: {}
open-webui: {}
networks: dockge_default: {}
{} networks: dockge_default: {}
external: true
- This Compose configuration defines a file named
open-webui
Docker container for running OpenWEBUI.version: "3.3"
: Docker Compose file version.services
: Define the list of services.open-webui
: The name of the service.ports
: Port mapping, maps port 8080 of the container to port 7000 of the host.volumes
: A data volume mount that places the host'sopen-webui
The volume is mounted to the container's/app/backend/data
directory for persistent storage of OpenWEBUI data.container_name
: container name, set toopen-webui
Therestart: always
: The container is always restarted automatically.image: ghcr.io/open-webui/open-webui:latest
: The Docker image used.ghcr.io/open-webui/open-webui:latest
is the latest Docker image for OpenWEBUI.
volumes
: Define the data volume.open-webui: {}
: Create a file namedopen-webui
of the data volume.
networks
: Defining Networks.dockge_default
: Network name.external: true
: Use of external networksdockge_default
Dockge will by default create a file nameddockge_default
of the network.
Click "save" and "run". The first run will be slow because you need to download the Docker image. To update the OpenWEBUI image later, just click the "update" button in the Dockge interface. Once the container is started, visit the http://服务器IP:7000
This will open the OpenWEBUI interface. For example, the server IP of Digital Spaceport is 192.168.1.200
Then the access address is http://192.168.1.200:7000
. The first time you visit you will need to set up a username and password, so again, it's important to remember. let's right now It's almost done! Whew!Whew!)
Connecting OpenWEBUI to Ollama
On the "/admin/settings" -> "Connections" page of the OpenWEBUI administration interface, click the "+" sign to add a new server connection. Fill in the server address Server IP:11434
. For example.192.168.1.200:11434
(Be careful to replace it with your own server IP address). (Be careful to replace it with your own server IP address). If the connection is successful, the OpenWEBUI interface will pop up a green message "connection success".
Once connected, click on the Connection Manager icon to see the connected Ollama servers. If the Deepseek model has already been downloaded, you can see it in the "Delete model" drop-down menu. deepseek-r1:671b
Model.
Congratulations, it's almost done! But... Don't leave the settings page in a hurry!
Setting advanced parameters
Click on the "Edit (Pen)" icon.
Advanced parameters can now be edited. Don't forget to click "SAVE!" when you're done.
- Number of GPUs (GPUs): If you don't have a GPU card installed, change this to
0
The - Reasoning Effort: Optional
low
(low)medium
(M) orhigh
(high), default ismedium
(center). - Context Length: Set to
16364
The context length of 16K is not a problem. Larger context lengths require more memory or GPU RAM. - Number of threads (num_thread): Set to
62
In addition, the system has 2 CPU cores reserved for the system. - Enable memory locking (use_mlock): Can be enabled to prevent memory data from being swapped out to disk and affecting performance. (Memory locking (mlock) prevents memory pages from being swapped out to disk by the operating system, which improves the efficiency of the program, but increases the memory footprint.)
Other parameters can be adjusted according to the instructions on the model card. Note: Don't try to use the full 160K context length unless you have 2TB of RAM! Even if it works, the speed will bundle Slow.
IMPORTANT: Don't forget to click "SAVE!"
Setting up user settings
Click on "User Settings" to change the user preferences. It is recommended that you set the "keep alive" setting to a longer period of time, for example 3 hours. Click "save" after changing the settings.
User settings are a bit easy to confuse with the previous administrator settings, so be careful to distinguish between them.
Run a test.
My goodness, I can't believe you made it this far! That's awesome, Digital Spaceport admires you! Click on "new chat" in the upper left corner of the OpenWEBUI interface. The model list should already have deepseek-r1:671b
Oh, yeah. Try sending a random "hello". It worked!
Deepseek R1 671b Performance Testing
Congratulations on successfully installing and running the Ollama + OpenWEBUI local big model service! Digital Spaceport believes that there are many other great local big model running solutions, such as llama.cpp
, exo
cap (a poem) vLLM
The tutorials will be published after Digital Spaceport has gone deeper into the subject. Tutorials will come out when Digital Spaceport has delved deeper. llama.cpp
Could be next, Digital Spaceport recently compiled, tested and ran the llama.cpp
The result is very good! Just too many parameters to work out. vLLM
It's a little complicated.exo
It's quite simple, but it keeps crashing after launching and I haven't had time to debug it yet. Anyway, as they say... Stay tuned! (SOON!)