Low Cost, and Safe DeepSeek-R1

It has been a while since we looked at running large language models (LLMs) locally. In AI things move fast, and while on the one hand, the foundational aspects haven’t changed much, there has been a ton of change with LLMs themselves. We have seen a number of “small” LLMs developed with performance that rivals their larger brethren, and this in turn lowers the hardware requirements for their use.

 

Blog Photos 2025 (1)-1

At the time of this writing DeepSeek-r1 is a few weeks old, and has been a popular topic in both the AI world, in addition to the mainstream media, and financial media at least on day 1, with NVIDIA’s stock price. Since then NVIDIA’s stock price has rebounded, they have released their 5080 and 5090 consumer GPU’s and sold out in seconds. At the moment, GPUs are again hard to purchase.

In this blog, we will take a look at using Proxmox with an AMD Radeon RX5700XT which are available on ebay for around $150. Due to the 7 billion parameter DeepSeek-r1 model needing slightly more than 4 GB of video memory, an 8 GB of video RAM card like the Radeon RX5700XT works well.

 

Hardware

As pointed out in the summary above, we are going to be using lower end hardware for this blog, however, the more capable the hardware, the more that can be done with it. Also, when using Proxmox, or any other virtualization solution in any sort of production environment, there should be redundant servers, that have redundant components, reliable storage, adequate memory, CPU, and GPU to meet both current and projected needs. This is an area that iuvo excels in, and if you are looking at an on-premises virtualization solution, please reach out.

 

Proxmax

Throughout 2025 we plan to have a number of blog posts on Proxmox, for this post, we are going to assume that the server is already setup, and that it has an AMD RDNA 1 generation GPU installed. Also, the GPU has not been passed through to any other virtual guests.

LXC Containers

Proxmox has native support for LXC containers, and we will take advantage of this to run DeepSeek, containers have less overhead and are more resource efficient than full virtual computers. However, containers also have less security, and this is not recommended for publicly hosting services. First, we need to start with an Ubuntu 24.04 container image. We are choosing Ubuntu as it is well supported for AI work in general, and with the tools we will be using in this post in particular. To download the Ubuntu image, select the storage device hosting container images, and the select CT Templates:

 Blog Pic 1

 

Then we will create a container. Generally, containers are used to host a single service, and the service doesn’t require a lot of compute resources. In this case, while it is a single service, we must allocate a lot of resources to the container. There needs to be more memory than is on the GPU, in our case the GPU has 8 GB of VRAM, and we are allocating 24GB of RAM to the container. The storage needs to be enough to hold the Ollama software, OpenWebUI software, and all of the LLMs you may want to use. Additionally, if there is data used for Retrieval-Augmented Generation, that space needs to be included. This should be on SSD storage if possible, to improve the system performance.

A benefit to using containers is that while we have allocated a lot of resources, they will only be used when needed, and available to the host to reallocate at other times.

We will now walk through setting up the container:

Blog pic 2

 

Choose Create CT in the top right of the browser. Name the container, and setup the root password, or even better use SSH keys for root access.

 

blog pic3

 

The container can be unprivileged, and should have nesting enabled to help with systemd.

 

blog pic 4

 

Here we will choose the Ubuntu 24.04 template we downloaded earlier.

 

blog pic 5

 

We are allocating storage for the container here; you will likely want more than 256 GB for any production use.

 

blog pic 6

 

We are allocating 8 CPU cores, if there is any chance the LLM will run on the CPU, more cores should be available. In our case we will be using the GPU.

 

blog pic 7

 

Having enough memory, which is always more than the amount on the GPU is essential, the system will fail without enough RAM allocated.

 

blog pic 8

 

Use network setting that work best for you. A static IP address, with properly setup DNS can be a benefit for using SSL. Any production setup should have an SSL certificate.

 

blog pic 9

 

As in the previous window, these settings should work for your particular network setup.

 

blog pic 10

 

Here clicking finish will create the container.

 

The container will be built quickly, and we can connect into it through the web console:

 

blog pic 11

 

We will login as root, and the password provided above.

First thing, as with any new system, we should apply all available patches, set the time zone, install some required packages and setup sudo:

 

Screenshot 2025-02-24 173136

apt update

apt upgrade

dpkg-reconfigure tzdata

echo "%adm   ALL=(ALL:ALL) PASSWD:ALL" > /etc/sudoers.d/adm

apt install mailutils curl wget git python3 python3-venv libgl1 \

   libglib2.0-0 apache2

reboot

 

Next we should add a user to the system, in my case we will use “aiuser”.

 

Screenshot 2025-02-25 090637Screenshot 2025-02-25 090801

adduser aiuser

info: Adding user `aiuser' ...

info: Selecting UID/GID from range 1000 to 59999 ...

info: Adding new group `aiuser' (1001) ...

info: Adding new user `aiuser' (1001) with group `aiuser (1001)' ...

info: Creating home directory `/home/aiuser' ...

info: Copying files from `/etc/skel' ...

New password:

Retype new password:

passwd: password updated successfully

Changing the user information for aiuser

Enter the new value, or press ENTER for the default

       Full Name []: AI User

       Room Number []:

       Work Phone []:

       Home Phone []:

       Other []:

Is the information correct? [Y/n]

info: Adding new user `aiuser' to supplemental / extra groups `users' ...

info: Adding user `aiuser' to group `users' ...

 

We need to add this user to the following groups:

  • adm
  • video
  • render

Screenshot 2025-02-25 091009

usermod -a -G adm,video,render aiuser

 

Then we recommend setting up ssh keys for this user, however, that is beyond the scope of this blog. After the user is setup, we will ssh in as this user, and use the account for all further work.

 

Screenshot 2025-02-25 091104

curl -L https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb -o /tmp/amdgpu-install_6.2.60204-1_all.deb

chmod 644 /tmp/amdgpu-install_6.2.60204-1_all.deb

sudo apt install /tmp/amdgpu-install_6.2.60204-1_all.deb

sudo amdgpu-install --usecase=rocm --no-dkms

sudo tee -a /etc/profile.d/rocm.sh <<EOF

export PATH=$PATH:/opt/rocm/bin

export PYTORCH_ROCM_ARCH="gfx1010"

export ROCM_PATH=/opt/rocm

export HSA_OVERRIDE_GFX_VERSION=10.1.0

export HIP_VISIBLE_DEVICES=0

export HIP_PLATFORM=amd

export HIP_DEVICE=0

export AMDGPU_TARGET=gfx1010

export CLBlast_DIR=/usr/lib/cmake/CLBlast

export ROCR_VISIBLE_DEVICES=0

EOF

sudo apt install radeontop nvtop

sudo mkdir /opt/openwebui

sudo chown -R aiuser:aiuser /opt/openwebui

sudo reboot

 

 

Please note, the Radeon RX 5700XT is:

 

Screenshot 2025-02-25 091154

HSA_OVERRIDE_GFX_VERSION=10.1.0

PYTORCH_ROCM_ARCH="gfx1010"

 

a difference Radeon model will be different, for example, a 6950 is “10.3.0” and “gfx1030”.

 

GPU Passthrough

At this point we have the basics in place, and we are going to work on passing through the GPU. Each system will be different on the specifics, but the process is the same.

ssh back into the LXC container and run the following command, noting the output:

 

Screenshot 2025-02-25 091500

cat /etc/group | grep -w 'render\|\video'

video:x:44:aiuser

render:x:993:aiuser

 

In our case, video is 44, and render is 993

On the Proxmox host system, NOT the container, run the following command:

 

Screenshot 2025-02-27 130409

 

ls -l /sys/class/drm/renderD*/device/driver

lrwxrwxrwx 1 root root 0 Feb 20 11:23 /sys/class/drm/renderD128/device/driver -> ../../../../../../bus/pci/drivers/amdgpu

 

Note the output above in blue

 

Screenshot 2025-02-27 130534

 

Our render device is 128 from above, and we will now add this to the container configuration:

/dev/dri/renderD128 with a group ID of 44

 

 

Then /dev/kfd with a group ID of 993

 

Screenshot 2025-02-27 130657

 

Now stop and start the container to get the GPU to be recognized:

 

Screenshot 2025-02-27 131338

sudo halt

 

Ollama install

Now we are ready to install Ollama, which we use as the chat tool.

sudo curl -fsSL https://ollama.com/install.sh | sudo sh

The above command effectively installs Ollama.

If you have a Radeon 5x00 GPU, run the command below to link the library file, for newer Radeons, this isn’t needed, but will also not hurt anything.

 

Screenshot 2025-02-27 132402

ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat

/usr/local/lib/ollama/rocm/rocblas/library/

 

Now we need to update the systemd start script.

Add these two lines to the [Service] section. Also note like above, the Radeon 5700 is “10.1.0” a difference Radeon model will be different, for example a 6950 is “10.3.0”.

 

Screenshot 2025-02-27 132522

Environment="ROCR_VISIBLE_DEVICES=0"

Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"

 

So now the file should look like:

 

Screenshot 2025-02-27 132915

[Unit]

Description=Ollama Service

After=network-online.target

 

[Service]

ExecStart=/usr/local/bin/ollama serve

User=ollama

Group=ollama

Restart=always

RestartSec=3

Environment="ROCR_VISIBLE_DEVICES=0"

Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"

Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin"

 

[Install]

WantedBy=default.target

 

Now let’s tell system to reload the file, and restart ollama:

 

Screenshot 2025-02-27 133031

systemctl daemon-reload

systemctl restart ollama

 

Now we can test ollama:

 

Screenshot 2025-02-27 133218

Screenshot 2025-02-27 133306

 

Excellent, it looks like it is working, and running “nvtop” in another window shows the GPU in use.

We are most of the way there. We just need to setup OpenWebUI to have a nice user interface to interact with Ollama now.

 

OpenWebUI

First we need to create a Python virtual environment for OpenWebUI to run in:

 

Screenshot 2025-02-27 133424

cd /opt/openwebui

python3 -m venv venv

source venv/bin/activate

pip install open-webui

 

The above step will take a while to download and install everything needed.

Now we will create a startup script in the same folder /opt/openwebui/go.sh

 

Screenshot 2025-02-27 133525

#!/usr/bin/bash

 

cd /opt/openwebui

source /opt/openwebui/venv/bin/activate

open-webui serve --port 3000

 

This will run OpenWebUI on port 3000

We can start OpenWebUI with the commands:

 

Screenshot 2025-02-27 133629

chmod 755 go.sh

 

Now we will manually run OpenWebUI to confirm it works:

 

/opt/openwebui/go.sh

 

Finally, opening a web browser and connecting to our host at:

http://[IP ADDRESS OR FQDN OF HOST]:3000/

We should get something like this:

 

 

You can now use OpenWebUI with Ollama

We can automatically run OpenWebUI by following these steps:

We will first create a systemd startup file:

/etc/systemd/system/openwebui.service

 

[Unit]

Description=OpenWebUI Server

After=network.target

StartLimitIntervalSec=0

 

[Service]

ExecStart=/opt/openwebui/go.sh

User=root

KillMode=process

Type=simple

 

[Install]

WantedBy=multi-user.target

 

We can run the commands below to enable and use the systemd startup service file:

 

sudo systemctl daemon-reload

sudo systemctl enable openwebui

sudo systemctl start openwebui

 

If this is going to be hosted for other users, beyond just a test setup, we strongly recommend having a reverse proxy that does SSL with https setup in front of OpenWebUI.

 

 

How iuvo Can Help 

Running DeepSeek-R1 on local, cost-effective hardware is not only possible but also highly efficient with the right setup. By leveraging Proxmox, LXC containers, and AMD GPUs, you can achieve a secure and optimized environment for AI workloads without breaking the bank. Whether you're experimenting with AI models, setting up a research environment, or looking for a private, on-prem solution, these steps can help you get started.

If you’re looking for a fully secure, managed AI hosting solution without the complexity, iuvo Secure AI Host is designed to support DeepSeek and other AI workloads in a private, compliant, and high-performance environment. Get in touch with us today to learn how iuvo Secure AI Host can enhance your AI infrastructure!

 

 

 

Related Content:

Subscribe Here For Our Blogs:

Recent Posts

Categories

see all