Low Cost, and Safe DeepSeek-R1

Written by Edwin Perkins | Mar 4, 2025 4:00:00 PM

It has been a while since we looked at running large language models (LLMs) locally. In AI things move fast, and while on the one hand, the foundational aspects haven’t changed much, there has been a ton of change with LLMs themselves. We have seen a number of “small” LLMs developed with performance that rivals their larger brethren, and this in turn lowers the hardware requirements for their use.

At the time of this writing DeepSeek-r1 is a few weeks old, and has been a popular topic in both the AI world, in addition to the mainstream media, and financial media at least on day 1, with NVIDIA’s stock price. Since then NVIDIA’s stock price has rebounded, they have released their 5080 and 5090 consumer GPU’s and sold out in seconds. At the moment, GPUs are again hard to purchase.

In this blog, we will take a look at using Proxmox with an AMD Radeon RX5700XT which are available on ebay for around $150. Due to the 7 billion parameter DeepSeek-r1 model needing slightly more than 4 GB of video memory, an 8 GB of video RAM card like the Radeon RX5700XT works well.

Hardware

As pointed out in the summary above, we are going to be using lower end hardware for this blog, however, the more capable the hardware, the more that can be done with it. Also, when using Proxmox, or any other virtualization solution in any sort of production environment, there should be redundant servers, that have redundant components, reliable storage, adequate memory, CPU, and GPU to meet both current and projected needs. This is an area that iuvo excels in, and if you are looking at an on-premises virtualization solution, please reach out.

Proxmax

Throughout 2025 we plan to have a number of blog posts on Proxmox, for this post, we are going to assume that the server is already setup, and that it has an AMD RDNA 1 generation GPU installed. Also, the GPU has not been passed through to any other virtual guests.

LXC Containers

Proxmox has native support for LXC containers, and we will take advantage of this to run DeepSeek, containers have less overhead and are more resource efficient than full virtual computers. However, containers also have less security, and this is not recommended for publicly hosting services. First, we need to start with an Ubuntu 24.04 container image. We are choosing Ubuntu as it is well supported for AI work in general, and with the tools we will be using in this post in particular. To download the Ubuntu image, select the storage device hosting container images, and the select CT Templates:

Then we will create a container. Generally, containers are used to host a single service, and the service doesn’t require a lot of compute resources. In this case, while it is a single service, we must allocate a lot of resources to the container. There needs to be more memory than is on the GPU, in our case the GPU has 8 GB of VRAM, and we are allocating 24GB of RAM to the container. The storage needs to be enough to hold the Ollama software, OpenWebUI software, and all of the LLMs you may want to use. Additionally, if there is data used for Retrieval-Augmented Generation, that space needs to be included. This should be on SSD storage if possible, to improve the system performance.

A benefit to using containers is that while we have allocated a lot of resources, they will only be used when needed, and available to the host to reallocate at other times.

We will now walk through setting up the container:

Choose Create CT in the top right of the browser. Name the container, and setup the root password, or even better use SSH keys for root access.

The container can be unprivileged, and should have nesting enabled to help with systemd.

Here we will choose the Ubuntu 24.04 template we downloaded earlier.

We are allocating storage for the container here; you will likely want more than 256 GB for any production use.

We are allocating 8 CPU cores, if there is any chance the LLM will run on the CPU, more cores should be available. In our case we will be using the GPU.

Having enough memory, which is always more than the amount on the GPU is essential, the system will fail without enough RAM allocated.

Use network setting that work best for you. A static IP address, with properly setup DNS can be a benefit for using SSL. Any production setup should have an SSL certificate.

As in the previous window, these settings should work for your particular network setup.

Here clicking finish will create the container.

The container will be built quickly, and we can connect into it through the web console:

We will login as root, and the password provided above.

First thing, as with any new system, we should apply all available patches, set the time zone, install some required packages and setup sudo:

apt update

apt upgrade

dpkg-reconfigure tzdata

echo "%adm ALL=(ALL:ALL) PASSWD:ALL" > /etc/sudoers.d/adm

apt install mailutils curl wget git python3 python3-venv libgl1 \

libglib2.0-0 apache2

reboot

Next we should add a user to the system, in my case we will use “aiuser”.

adduser aiuser

info: Adding user `aiuser' ...

info: Selecting UID/GID from range 1000 to 59999 ...

info: Adding new group `aiuser' (1001) ...

info: Adding new user `aiuser' (1001) with group `aiuser (1001)' ...

info: Creating home directory `/home/aiuser' ...

info: Copying files from `/etc/skel' ...

New password:

Retype new password:

passwd: password updated successfully

Changing the user information for aiuser

Enter the new value, or press ENTER for the default

Full Name []: AI User

Room Number []:

Work Phone []:

Home Phone []:

Other []:

Is the information correct? [Y/n]

info: Adding new user `aiuser' to supplemental / extra groups `users' ...

info: Adding user `aiuser' to group `users' ...

We need to add this user to the following groups:

adm
video
render

usermod -a -G adm,video,render aiuser

Then we recommend setting up ssh keys for this user, however, that is beyond the scope of this blog. After the user is setup, we will ssh in as this user, and use the account for all further work.

curl -L https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb -o /tmp/amdgpu-install_6.2.60204-1_all.deb

chmod 644 /tmp/amdgpu-install_6.2.60204-1_all.deb

sudo apt install /tmp/amdgpu-install_6.2.60204-1_all.deb

sudo amdgpu-install --usecase=rocm --no-dkms

sudo tee -a /etc/profile.d/rocm.sh <<EOF

export PATH=$PATH:/opt/rocm/bin

export PYTORCH_ROCM_ARCH="gfx1010"

export ROCM_PATH=/opt/rocm

export HSA_OVERRIDE_GFX_VERSION=10.1.0

export HIP_VISIBLE_DEVICES=0

export HIP_PLATFORM=amd

export HIP_DEVICE=0

export AMDGPU_TARGET=gfx1010

export CLBlast_DIR=/usr/lib/cmake/CLBlast

export ROCR_VISIBLE_DEVICES=0

EOF

sudo apt install radeontop nvtop

sudo mkdir /opt/openwebui

sudo chown -R aiuser:aiuser /opt/openwebui

sudo reboot

Please note, the Radeon RX 5700XT is:

HSA_OVERRIDE_GFX_VERSION=10.1.0

PYTORCH_ROCM_ARCH="gfx1010"

a difference Radeon model will be different, for example, a 6950 is “10.3.0” and “gfx1030”.

GPU Passthrough

At this point we have the basics in place, and we are going to work on passing through the GPU. Each system will be different on the specifics, but the process is the same.

ssh back into the LXC container and run the following command, noting the output:

cat /etc/group | grep -w 'render\|\video'

video:x:44:aiuser

render:x:993:aiuser

In our case, video is 44, and render is 993

On the Proxmox host system, NOT the container, run the following command:

ls -l /sys/class/drm/renderD*/device/driver

lrwxrwxrwx 1 root root 0 Feb 20 11:23 /sys/class/drm/renderD128/device/driver -> ../../../../../../bus/pci/drivers/amdgpu

Note the output above in blue

Our render device is 128 from above, and we will now add this to the container configuration:

/dev/dri/renderD128 with a group ID of 44

Then /dev/kfd with a group ID of 993

Now stop and start the container to get the GPU to be recognized:

sudo halt

Ollama install

Now we are ready to install Ollama, which we use as the chat tool.

sudo curl -fsSL https://ollama.com/install.sh | sudo sh

The above command effectively installs Ollama.

If you have a Radeon 5x00 GPU, run the command below to link the library file, for newer Radeons, this isn’t needed, but will also not hurt anything.

ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat

/usr/local/lib/ollama/rocm/rocblas/library/

Now we need to update the systemd start script.

Add these two lines to the [Service] section. Also note like above, the Radeon 5700 is “10.1.0” a difference Radeon model will be different, for example a 6950 is “10.3.0”.

Environment="ROCR_VISIBLE_DEVICES=0"

Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"

So now the file should look like:

[Unit]

Description=Ollama Service

After=network-online.target

[Service]

ExecStart=/usr/local/bin/ollama serve

User=ollama

Group=ollama

Restart=always

RestartSec=3

Environment="ROCR_VISIBLE_DEVICES=0"

Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"

Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin"

[Install]

WantedBy=default.target

Now let’s tell system to reload the file, and restart ollama:

systemctl daemon-reload

systemctl restart ollama

Now we can test ollama:

Excellent, it looks like it is working, and running “nvtop” in another window shows the GPU in use.

We are most of the way there. We just need to setup OpenWebUI to have a nice user interface to interact with Ollama now.

OpenWebUI

First we need to create a Python virtual environment for OpenWebUI to run in:

cd /opt/openwebui

python3 -m venv venv

source venv/bin/activate

pip install open-webui

The above step will take a while to download and install everything needed.

Now we will create a startup script in the same folder /opt/openwebui/go.sh

#!/usr/bin/bash

cd /opt/openwebui

source /opt/openwebui/venv/bin/activate

open-webui serve --port 3000

This will run OpenWebUI on port 3000

We can start OpenWebUI with the commands:

chmod 755 go.sh

Now we will manually run OpenWebUI to confirm it works:

/opt/openwebui/go.sh

Finally, opening a web browser and connecting to our host at:

http://[IP ADDRESS OR FQDN OF HOST]:3000/

We should get something like this:

You can now use OpenWebUI with Ollama

We can automatically run OpenWebUI by following these steps:

We will first create a systemd startup file:

/etc/systemd/system/openwebui.service

[Unit]

Description=OpenWebUI Server

After=network.target

StartLimitIntervalSec=0

[Service]

ExecStart=/opt/openwebui/go.sh

User=root

KillMode=process

Type=simple

[Install]

WantedBy=multi-user.target

We can run the commands below to enable and use the systemd startup service file:

sudo systemctl daemon-reload

sudo systemctl enable openwebui

sudo systemctl start openwebui

If this is going to be hosted for other users, beyond just a test setup, we strongly recommend having a reverse proxy that does SSL with https setup in front of OpenWebUI.

How iuvo Can Help

Running DeepSeek-R1 on local, cost-effective hardware is not only possible but also highly efficient with the right setup. By leveraging Proxmox, LXC containers, and AMD GPUs, you can achieve a secure and optimized environment for AI workloads without breaking the bank. Whether you're experimenting with AI models, setting up a research environment, or looking for a private, on-prem solution, these steps can help you get started.

If you’re looking for a fully secure, managed AI hosting solution without the complexity, iuvo Secure AI Host is designed to support DeepSeek and other AI workloads in a private, compliant, and high-performance environment. Get in touch with us today to learn how iuvo Secure AI Host can enhance your AI infrastructure!