Introduction
As technology continues to evolve, it's exciting to see how different components can come together to create something truly unique. In this blog post, I'll be sharing my experience setting up private LLM at home using Ollama Server, running on Windows 11 with an RTX 4090, in combination with a Qnap NAS box, and Kubernetes (K8s) running on an Intel NUC. We'll explore how these components work together to create a seamless and fun experience, all thanks to the power of Open-WebUI, ollama and the might llama3.

The Setup
To start, let's take a look at the setup I used for this project:

  • Ollama Server: Running on Windows 11 with an RTX 4090
  • Qnap NAS Box: An older model from Qnap, serving as a reliable NFS storage solution for persistent data toK8's.
  • Intel NUC: A small but powerful device running Kubernetes (K8s) and acting as the central hub for my containerized applications.
  • Open-Web UI: A simple web interface for managing and interacting with my containers, deployed in k8

Setup:

On the Windows-11 Desktop Conveniently called as Agent-11, I have Ollama installed and running as server

ollama pull llama3
ollama serve

Since it's for fun along with some productivity as a byproduct, I still use Agent-11 for other activities like writing this blog, so divided running components across.

UI:

I was searching what's the best way to interact with Llama, through Ollama provides some native functionality, Found the awesome popular Open-WebUI and glad I spent some quick time exploring it, Deployment is fun and surprise is how quick i could deploy it to k8 and have it up and running in seconds.

With this YAML file, I was able to deploy my blog using the K8s command-line tool. The Ollama Server and Open-Web UI containers were successfully created and started.


---
apiVersion: v1
kind: Service
metadata:
  name: open-webui
  namespace: open-webui
spec:
  type: NodePort
  ports:
    - port: 3000
      targetPort: 8080
      protocol: TCP
  selector:
    app: open-webui

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
  namespace: open-webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
        - name: open-webui
          image: ghcr.io/open-webui/open-webui:main
          ports:
            - containerPort: 8080
          env:
            - name: OLLAMA_BASE_URL
              value: "http://10.0.0.155:11434"
      restartPolicy: Always

Since I want to have the data to be persistent, I looked at some documentation and found some info that I am currently using to make data persistent (At least the login and previous asks were persistent).

I do have an 7+ year old Qnap HomeNAS that's been inactive for a while and just got recently powered on when I started to have some k8 fun, Created an NFS share and provided access to HomeLAN which the K8's are part of. I created a folder in that storage called as "open-webui"

1-open-webui-ns-pv-pvc.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name:  open-webui
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: open-webui-pv-nfs
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /k8-storage/open-webui
    server: 10.0.0.129
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: open-webui-pvc-nfs-mount
  namespace: open-webui
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs  # Must match the storageClassName in your PV

2-open-webui.yaml


---
apiVersion: v1
kind: Service
metadata:
  name: open-webui
  namespace: open-webui
spec:
  type: NodePort
  ports:
    - port: 3000
      targetPort: 8080
      protocol: TCP
  selector:
    app: open-webui

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
  namespace: open-webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
        - name: open-webui
          image: ghcr.io/open-webui/open-webui:main
          ports:
            - containerPort: 8080
          env:
            - name: OLLAMA_BASE_URL
              value: "http://10.0.0.155:11434"
          volumeMounts:
            - name: open-webui-storage-volume
              mountPath: /app/backend/data
      restartPolicy: Always
      volumes:
      - name: open-webui-storage-volume
        persistentVolumeClaim:
          claimName: open-webui-pvc-nfs-mount

After Deploying above, I am just directly using Nodeport to access the Service

k8admin@k8-ct1:~$ k apply -f 1-open-webui-ns-pv-pvc.yaml 
namespace/open-webui created
persistentvolume/open-webui-pv-nfs created
persistentvolumeclaim/open-webui-pvc-nfs-mount created
k8admin@k8-ct1:~$ k apply -f 2-open-webui.yaml 
service/open-webui created
deployment.apps/open-webui created
k8admin@k8-ct1:~$ k get svc -n open-webui -o wide
NAME         TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE   SELECTOR
open-webui   NodePort   10.99.87.103   <none>        3000:31785/TCP   51s   app=open-webui

Ref-Diagram:


Once deployed, I could access Open-Web UI, create a User for login and validate it can load models from ollama and select a default model(in may case i have 2 models in ollama and I am using "llama3-8B"

Now once everything is working and as I started to have some fun with llama3, I thought of how to make this more exciting, this is where I started looking on how to integrate metrics for my llama usage. Luckily I have Grafana running in my k8's(CrossRef: K8-Setup Link in future). So I explored some ways to export metrics, though it's not straightforward, once its up and running it's almost invisible and metrics are fun to see how GPU usage is directly proportional to using Open-Webui/ollama.

Ref-Diagram

Grafana Screenshot: The peaks are here I am asking llama3 for help.

References/Tools:-

Tagged in:

Tech, HomeLab, AI

Last Update: May 19, 2024