Sunday, April 19, 2026

The Homelab that just kept growing ( part 1 of 2 )

The Homelab that just kept growing

After 25 years in open source, I wanted to understand every layer. Then I wanted my AI to remember what I taught it.


The Pi

A few years back, I ordered a Raspberry Pi to play and explore. I soon became tired of hearing other engineers talk about their homelabs and knew that Pi just would not do it.

It wasn't enough. I kept wanting to spin up different distros to check something, test a configuration, verify behavior. I reinstalled onto that pi so many times I lost count. So I bought an AOOSTAR R1 N100—a $280 mini PC that punches way above its weight. Intel N100 (4C/4T, 6W TDP), 32GB DDR4, dual 2.5GbE NICs, dual M.2 slots, and two 3.5" drive bays. I think it can handle up to 44TB of storage. For me, I've filled it with scavenged drives over time—13 drives ranging from 300GB laptop drives to 4TB externals, totaling about 16TB raw. It's a Franken-storage setup, but it works. Perfect for Proxmox VE.

I started building: DNS servers, database clusters, a full ISP stack. Debian, Ubuntu, CentOS, Rocky Linux, Oracle Linux—67 VMs across the full enterprise Linux matrix. The N100 handled it until I really loaded it down, but the key insight was: not everything needs to run 24/7. I could spin up an Oracle Linux 8 VM, test a client configuration, shut it down. The ability to test anything was what mattered.

But there was no chance it could do AI. No GPU, and that N100's integrated graphics won't even handle Stable Diffusion at acceptable speeds.


Deborah

I named her after my late mother. Deborah is a custom-built workstation: AMD Ryzen 7 5800XT (8C/16T, 4.8GHz boost), 64GB DDR4-3200, RTX 4070 Ti Super 16GB (the 16GB VRAM is the critical spec—you can't quantize your way out of everything). 2TB NVMe for OS, 8TB NVMe for model weights, 750W PSU. Total build: just under $2500.

The AOOSTAR still runs the infrastructure—DNS, Percona MySQL (replica and smaller instances), Prometheus, Alertmanager (tied to PagerDuty and my private mail.apocryia.com mail server), Vault, plus MediaWiki for documentation, a private GitLab instance, and GitLab runners for CI/CD. Deborah handles the AI compute. The split is intentional: N100 runs 24/7 at ~30W, Deborah is for all my GPU power needs.

I'd been using Grok and Claude. They're brilliant. I still use them. But I was paying for black boxes, and after 25 years of open source, that grated. I wanted to see inside. I wanted my own models. I wanted to know how inference actually worked, not just trust the API response.

The 16GB GPU is limiting. It sounds like enough until you try to run a 70B model, or video generation, or keep local models loaded while doing CLI work. I clear GPU memory constantly. More GPU would mean bigger models, longer videos, and the ability to do generative work while keeping my development assistants running. But you work with what you have.


The Memory Problem

The real frustration came later. I was using Claude Code, then Kimi CLI—both excellent tools. But I'd sit for an hour and have to repeat myself. The agents would forget what we'd just decided. They'd debate me over things they did ten minutes ago. Context windows fill up, patterns get lost, and you start over in a new session like the previous three hours never happened.

I learned to plan better—break work into chunks, be more deliberate about what I asked for. That helped. But planning can only get you so far. Complex systems have interconnected decisions that span sessions. The memory became the bridge—capturing the rationale so even with good planning, I wasn't starting from zero every time.

I'm a database guy. This is a database problem.


Building the Brain

The solution was obvious: give the agents a brain they could actually use. MySQL for structured memory—what we decided, when, why. Neo4j for pattern recognition—what connects to what, which decisions led where. MCP (Model Context Protocol) as the protocol so Claude Code and Kimi CLI could share context, query the same memory, build on each other's work. I already wrote about this some here

Now for example, I can work on an Android app locally while developing the API remotely, and both agents know what the other is doing. They share the database. I don't cut and paste between terminals. When I switch from Kimi to Claude, the rationale is there—the why behind the code, not just the code itself.

Kimi CLI is my daily driver now. It has been awesome as of late. Claude Code is excellent too. I still reach for Grok, still use commercial APIs when I need the best models. Open source hasn't caught up completely at the level of GPU I am using. But my home assistant—Jarvis—is entirely local. It cleans my Gmail spam, monitors the systems, tells me if database replication breaks because I did something stupid. It can talk, but I'm busy typing.


The Mistakes Along the Way

I spent time on a website frontend. It worked, but it couldn't develop code the way I wanted. I live in vi and terminal windows. GUI frontends aren't my workflow. Cursor was hard to accept, but it's a great tool. Then cursor-agent and Claude Code and Kimi CLI all came out—game changers. I even built my own CLI tool to use local open source models, but loading that GPU so much became a bottleneck. Then Claude Code started working with Ollama directly—no need for my custom tool anymore.

Everyone's excited about clawbot/molt these days. I checked it out. I think it is over-hyped. Didn't do anything I needed (that mattered) and could not do another way. I already had it. Except for texting my AI remotely—okay, why? Why do I need to write code over text messages? If I need to communicate remotely, I can build better ways. To be fair I did let Molt clean out my spam emails just to give it something to do.


Why This Matters

The homelab started as FOMO. It became the infrastructure for independence. Then it became the infrastructure for memory—persistent, structured, inspectable.

After 25+ years of open source, my philosophy is simple: Take it to make it.. and you are always learning. So no reason our code should not be always learning and remembering too.

The GPU is still limiting. The commercial models are still better. But when Kimi CLI pulls up a decision from last week and understands why we chose FastAPI over Flask, when Jarvis quietly clears my spam, TTS alerts me to a replication lag, when I can switch tools without losing context—that's worth building. I can sit can just talk to the models but I also admit.. very unnatural still for me. I find myself with not much to say... Would rather code lol.

The open source models are awesome. But not the best yet, again for my level of GPU. Claude, Grok, and Kimi still have that market. More GPU would get me closer. For now, I hybrid—local for control and privacy, cloud for capability. But every layer I own is a layer I understand.


The Kubernetes Cluster

Between the AOOSTAR N100 and Deborah, there's a third layer: a 3-node K3s Kubernetes cluster running on dedicated Debian VMs within Proxmox. After years of running everything as standalone VMs, I wanted container orchestration for the services that needed to scale, self-heal, and deploy consistently.

The cluster runs Kubernetes v1.34.4+k3s1 on Debian GNU/Linux 13 (trixie), using containerd as the runtime. K3s was the obvious choice—lightweight, single-binary, CNCF-certified, and it just works. The cluster is small but production-grade, with CSI-NFS for persistent storage, Traefik for ingress, and the full Prometheus monitoring stack.

What's running on the cluster:

  • Monitoring: Prometheus + Alertmanager + Grafana (via kube-prometheus-stack)
  • CI/CD: GitLab Runners (2 replicas) for containerized builds
  • Automation: CronJob-based spam cleaner for my mail server
  • Ingress: Traefik for HTTP routing and SSL termination
  • Dashboard: Kubernetes Dashboard for cluster visibility

The Prometheus instance on the cluster monitors itself plus feeds into the main Prometheus on the N100. Same Alertmanager, same PagerDuty integration. It's a nested monitoring setup—cluster metrics bubble up to the infrastructure level.

This is where the stateless services live. Databases still run on dedicated VMs and bare meatal (replicated Percona MySQL), but anything that can be containerized, is. The GitLab runners on K3s handle container builds, then artifacts get pushed to the GitLab instance on the N100. Clean separation of concerns.


The Stack Today

AOOSTAR N100 (Proxmox)

  • 67 VMs across 12 Linux distributions
  • 3-node K3s Kubernetes cluster (control plane + 2 workers)
  • Redundant DNS (Technitium DNS Server), Percona MySQL cluster
  • Primary Prometheus + Alertmanager monitoring (integrated with PagerDuty and private mail server)
  • HashiCorp Vault for secrets
  • Twingate for VPN and remote access
  • GitLab instance (CI/CD runners moved to K3s)
  • 15 hard drives (scavenged, various sizes)
  • Power draw: ~30W, running 24/7

Deborah (AI Workstation)

  • AMD Ryzen 7 5800XT, 64GB RAM, RTX 4070 16GB
  • Ollama with 30+ models
  • 12,000+ model files - 10 custom fine-tuned GGUFs (ApocryiaAI V2/V3/Unified variants), 481 safetensors (FLUX, SDXL, Llama), 14 PyTorch bins, and 11,790 XGBoost stock prediction models from HuggingFace
  • Primary Percona MySQL (main database) + Neo4j persistent memory system
  • Full AI service stack:
    • Web UI (main interfacei when not on cli)
    • Ollama API (local inference)
    • Oobabooga: (text generation UI rarely used)
    • Voice/TTS System: OpenVoice + Zonos + CosyVoice + ElevenLabs + Inworld AI — all voice engines controlled via unified API and backend databases. Cloning, synthesis, voice conversion, and character voices. Plus wake-word model switching: 22+ voice keywords trigger different AI personalities (see table below)
    • Image generation: ComfyUI, InvokeAI
    • Video: Wan-AI installed but needs more GPU to be practical
    • Tabby
  • 25 years of stock data, 8TB of XGBoost models—a separate playground of exploration alongside the AI work
  • Power draw: ~150-300W when active

Voice-Controlled Model Switching

The voice system has wake-word activated model switching. Instead of manually selecting which AI model to use, I just say a keyword and the system routes to the appropriate model automatically. Each keyword activates a different AI personality optimized for specific tasks:

Wake Word Model Activated Personality/Purpose
JARVISApocryiaAI-system:latestSystem assistant
HAL 9000ApocryiaAI-system:latestSystem monitoring, alerts (yes, really)
KEITHapocryiaai-unified:latestMy custom fine-tuned model
DANIEL, WALTERllama3.1:latestGeneral assistants (different personas)
MATILDA, MORGANllama3.1:8bLightweight assistants
ELLIOTcodegemma:latestCoding assistant
OBSERVERdeepseek-coder-v2:latestCode analysis, security review
OLIVIAopenthinker:latestReasoning, complex problem solving
TRANSCRIBE NOWllama3.1:8bSpeech-to-text mode
VOICEOVERllama3.1:8bTTS/synthesis mode
INTERNllama3.1:8bLearning mode, asks questions
REPORTERllama3.1:latestSummarization, documentation

Agent Workflow

  • Kimi CLI: Primary development tool, excellent for coding and cheaper than Claude
  • Claude Code: Secondary, excellent for complex reasoning
  • MCP protocol: Shared memory between agents
  • Different TTS/STT setups for Local assistant for monitoring, email, alerts

Related Reading


Keith is a database consultant and infrastructure engineer with 25+ years of open source experience. He writes about MySQL, Proxmox, AI memory systems, and building technology you can actually inspect.

No comments:

Post a Comment