It has been a while since we looked at running large language models (LLMs) locally. In AI things move fast, and while on the one hand, the foundational aspects haven’t changed much, there has been a ton of change with LLMs themselves. We have seen a number of “small” LLMs developed with performance that rivals their larger brethren, and this in turn lowers the hardware requirements for their use.
At the time of this writing DeepSeek-r1 is a few weeks old, and has been a popular topic in both the AI world, in addition to the mainstream media, and financial media at least on day 1, with NVIDIA’s stock price. Since then NVIDIA’s stock price has rebounded, they have released their 5080 and 5090 consumer GPU’s and sold out in seconds. At the moment, GPUs are again hard to purchase.
In this blog, we will take a look at using Proxmox with an AMD Radeon RX5700XT which are available on ebay for around $150. Due to the 7 billion parameter DeepSeek-r1 model needing slightly more than 4 GB of video memory, an 8 GB of video RAM card like the Radeon RX5700XT works well.
Hardware
As pointed out in the summary above, we are going to be using lower end hardware for this blog, however, the more capable the hardware, the more that can be done with it. Also, when using Proxmox, or any other virtualization solution in any sort of production environment, there should be redundant servers, that have redundant components, reliable storage, adequate memory, CPU, and GPU to meet both current and projected needs. This is an area that iuvo excels in, and if you are looking at an on-premises virtualization solution, please reach out.
Proxmax
Throughout 2025 we plan to have a number of blog posts on Proxmox, for this post, we are going to assume that the server is already setup, and that it has an AMD RDNA 1 generation GPU installed. Also, the GPU has not been passed through to any other virtual guests.
LXC Containers
Proxmox has native support for LXC containers, and we will take advantage of this to run DeepSeek, containers have less overhead and are more resource efficient than full virtual computers. However, containers also have less security, and this is not recommended for publicly hosting services. First, we need to start with an Ubuntu 24.04 container image. We are choosing Ubuntu as it is well supported for AI work in general, and with the tools we will be using in this post in particular. To download the Ubuntu image, select the storage device hosting container images, and the select CT Templates:
Then we will create a container. Generally, containers are used to host a single service, and the service doesn’t require a lot of compute resources. In this case, while it is a single service, we must allocate a lot of resources to the container. There needs to be more memory than is on the GPU, in our case the GPU has 8 GB of VRAM, and we are allocating 24GB of RAM to the container. The storage needs to be enough to hold the Ollama software, OpenWebUI software, and all of the LLMs you may want to use. Additionally, if there is data used for Retrieval-Augmented Generation, that space needs to be included. This should be on SSD storage if possible, to improve the system performance.
A benefit to using containers is that while we have allocated a lot of resources, they will only be used when needed, and available to the host to reallocate at other times.
We will now walk through setting up the container:
Choose Create CT in the top right of the browser. Name the container, and setup the root password, or even better use SSH keys for root access.
The container can be unprivileged, and should have nesting enabled to help with systemd.
Here we will choose the Ubuntu 24.04 template we downloaded earlier.
We are allocating storage for the container here; you will likely want more than 256 GB for any production use.
We are allocating 8 CPU cores, if there is any chance the LLM will run on the CPU, more cores should be available. In our case we will be using the GPU.
Having enough memory, which is always more than the amount on the GPU is essential, the system will fail without enough RAM allocated.
Use network setting that work best for you. A static IP address, with properly setup DNS can be a benefit for using SSL. Any production setup should have an SSL certificate.
As in the previous window, these settings should work for your particular network setup.
Here clicking finish will create the container.
The container will be built quickly, and we can connect into it through the web console:
We will login as root, and the password provided above.
First thing, as with any new system, we should apply all available patches, set the time zone, install some required packages and setup sudo:
apt update
apt upgrade
dpkg-reconfigure tzdata
echo "%adm ALL=(ALL:ALL) PASSWD:ALL" > /etc/sudoers.d/adm
apt install mailutils curl wget git python3 python3-venv libgl1 \
libglib2.0-0 apache2
reboot
Next we should add a user to the system, in my case we will use “aiuser”.
adduser aiuser
info: Adding user `aiuser' ...
info: Selecting UID/GID from range 1000 to 59999 ...
info: Adding new group `aiuser' (1001) ...
info: Adding new user `aiuser' (1001) with group `aiuser (1001)' ...
info: Creating home directory `/home/aiuser' ...
info: Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for aiuser
Enter the new value, or press ENTER for the default
Full Name []: AI User
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n]
info: Adding new user `aiuser' to supplemental / extra groups `users' ...
info: Adding user `aiuser' to group `users' ...
We need to add this user to the following groups:
- adm
- video
- render
usermod -a -G adm,video,render aiuser
Then we recommend setting up ssh keys for this user, however, that is beyond the scope of this blog. After the user is setup, we will ssh in as this user, and use the account for all further work.
curl -L https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb -o /tmp/amdgpu-install_6.2.60204-1_all.deb
chmod 644 /tmp/amdgpu-install_6.2.60204-1_all.deb
sudo apt install /tmp/amdgpu-install_6.2.60204-1_all.deb
sudo amdgpu-install --usecase=rocm --no-dkms
sudo tee -a /etc/profile.d/rocm.sh <<EOF
export PATH=$PATH:/opt/rocm/bin
export PYTORCH_ROCM_ARCH="gfx1010"
export ROCM_PATH=/opt/rocm
export HSA_OVERRIDE_GFX_VERSION=10.1.0
export HIP_VISIBLE_DEVICES=0
export HIP_PLATFORM=amd
export HIP_DEVICE=0
export AMDGPU_TARGET=gfx1010
export CLBlast_DIR=/usr/lib/cmake/CLBlast
export ROCR_VISIBLE_DEVICES=0
EOF
sudo apt install radeontop nvtop
sudo mkdir /opt/openwebui
sudo chown -R aiuser:aiuser /opt/openwebui
sudo reboot
Please note, the Radeon RX 5700XT is:
HSA_OVERRIDE_GFX_VERSION=10.1.0
PYTORCH_ROCM_ARCH="gfx1010"
a difference Radeon model will be different, for example, a 6950 is “10.3.0” and “gfx1030”.
GPU Passthrough
At this point we have the basics in place, and we are going to work on passing through the GPU. Each system will be different on the specifics, but the process is the same.
ssh back into the LXC container and run the following command, noting the output:
cat /etc/group | grep -w 'render\|\video'
video:x:44:aiuser
render:x:993:aiuser
In our case, video is 44, and render is 993
On the Proxmox host system, NOT the container, run the following command:
ls -l /sys/class/drm/renderD*/device/driver
lrwxrwxrwx 1 root root 0 Feb 20 11:23 /sys/class/drm/renderD128/device/driver -> ../../../../../../bus/pci/drivers/amdgpu
Note the output above in blue
Our render device is 128 from above, and we will now add this to the container configuration:
/dev/dri/renderD128 with a group ID of 44
Then /dev/kfd with a group ID of 993
Now stop and start the container to get the GPU to be recognized:
sudo halt
Ollama install
Now we are ready to install Ollama, which we use as the chat tool.
sudo curl -fsSL https://ollama.com/install.sh | sudo sh
The above command effectively installs Ollama.
If you have a Radeon 5x00 GPU, run the command below to link the library file, for newer Radeons, this isn’t needed, but will also not hurt anything.
ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat
/usr/local/lib/ollama/rocm/rocblas/library/
Now we need to update the systemd start script.
Add these two lines to the [Service] section. Also note like above, the Radeon 5700 is “10.1.0” a difference Radeon model will be different, for example a 6950 is “10.3.0”.
Environment="ROCR_VISIBLE_DEVICES=0"
Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"
So now the file should look like:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="ROCR_VISIBLE_DEVICES=0"
Environment="HSA_OVERRIDE_GFX_VERSION=10.1.0"
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin"
[Install]
WantedBy=default.target
Now let’s tell system to reload the file, and restart ollama:
systemctl daemon-reload
systemctl restart ollama
Now we can test ollama:
Excellent, it looks like it is working, and running “nvtop” in another window shows the GPU in use.
We are most of the way there. We just need to setup OpenWebUI to have a nice user interface to interact with Ollama now.
OpenWebUI
First we need to create a Python virtual environment for OpenWebUI to run in:
cd /opt/openwebui
python3 -m venv venv
source venv/bin/activate
pip install open-webui
The above step will take a while to download and install everything needed.
Now we will create a startup script in the same folder /opt/openwebui/go.sh
#!/usr/bin/bash
cd /opt/openwebui
source /opt/openwebui/venv/bin/activate
open-webui serve --port 3000
This will run OpenWebUI on port 3000
We can start OpenWebUI with the commands:
chmod 755 go.sh
Now we will manually run OpenWebUI to confirm it works:
/opt/openwebui/go.sh
Finally, opening a web browser and connecting to our host at:
http://[IP ADDRESS OR FQDN OF HOST]:3000/
We should get something like this:
You can now use OpenWebUI with Ollama
We can automatically run OpenWebUI by following these steps:
We will first create a systemd startup file:
/etc/systemd/system/openwebui.service
[Unit]
Description=OpenWebUI Server
After=network.target
StartLimitIntervalSec=0
[Service]
ExecStart=/opt/openwebui/go.sh
User=root
KillMode=process
Type=simple
[Install]
WantedBy=multi-user.target
We can run the commands below to enable and use the systemd startup service file:
sudo systemctl daemon-reload
sudo systemctl enable openwebui
sudo systemctl start openwebui
If this is going to be hosted for other users, beyond just a test setup, we strongly recommend having a reverse proxy that does SSL with https setup in front of OpenWebUI.
How iuvo Can Help
Running DeepSeek-R1 on local, cost-effective hardware is not only possible but also highly efficient with the right setup. By leveraging Proxmox, LXC containers, and AMD GPUs, you can achieve a secure and optimized environment for AI workloads without breaking the bank. Whether you're experimenting with AI models, setting up a research environment, or looking for a private, on-prem solution, these steps can help you get started.
If you’re looking for a fully secure, managed AI hosting solution without the complexity, iuvo Secure AI Host is designed to support DeepSeek and other AI workloads in a private, compliant, and high-performance environment. Get in touch with us today to learn how iuvo Secure AI Host can enhance your AI infrastructure!
Related Content: