Software & Apps

How I run LLMs locally

A HN user asked me0 how I run LLMs locally with some specific questions, I’m documenting it here for everyone.

Before I begin I want to honor the thousands or millions of unknown artists, coders and writers whose work has been trained by Large Language Models(LLMs), often without due credit or compensation.

dash

r/LocalLLaMA subreddit1 & Ollama blog2 good places to start running LLMs locally.

Hardware

I have a laptop running Linux with a core i9 (32threads) CPU, 4090 GPU (16GB VRAM) and 96 GB of RAM. Models that fit inside VRAM can generate more tokens/second, larger models will be offloaded to RAM (dGPU offloading) and thus tokens/second will be lower. I will talk about models in a section below.

It is not necessary to have such a strong computer for running LLMs locally, small models can do well with older GPUs or CPUs even if they are slow and have a lot of hallucinations.

There are several high-quality open-source tools that enable running LLMs locally. These are the tools I use most often.

To be3 a middleware with python, JavaScript libraries for llama.cpp4 which helps in running LLMs. I use Ollama on docker5.

Open the WebUI6 is a frontend that offers a familiar chat interface for text and image input and communicates with the Ollama back-end and streams the output back to the user.

llamafile7 an executable file containing LLM. This is probably the easiest way to get started with local LLMs, but I’m having issues with dGPU offloading in llamafile8.

I’m not a big consumer of image / video production models, but if necessary, I use AUTOMATIC11119 for images that need some customization and Fooocus10 for simple image generation. For complex workflow automatons with image creation, there’s ComfyUI11.

For code completion I use Continue12 in VSCode.

I use Smart Connections13 of Obsidian14 to query my notes using Ollama.

Screenshot of Obsidian with the chat smart connections extension showing the last journal I wrote
I asked Smart Connections when I wrote my last journal, I hope to write my journal every day in 2025.

Models

I use the Ollama models page15 to download the latest LLMs. I use RSS in Thunderbird to track models. I use CivitAI16 to download rendering models for specific styles (eg Isometric for world building). But note that most of CivitAI’s models seem to be intended for creating adult images.

I choose LLMs based on performance / size. My current selection for LLMs is constantly changing due to the rapid development of LLMs.

•	Llama3.2 for Smart Connections and generic queries.
•	Deepseek-coder-v2 for code completion in Continue.
•	Qwen2.5-coder for chatting about code in Continue.
•	Stable Diffusion for image generation in AUTOMATIC1111 or Fooocus.

Update

I update docker containers using WatchTower17 and models from within the Open Web UI.

Fine-Tuning and Quantization

I have not fine-tuned or quantized any of the models on my machine because my Intel CPU may have a manufacturing defect18 so I don’t want to push it to high temperatures for long periods of time during training.

CONCLUSION

Running LLMs locally gives me full control over my data and low latency for responses. None of this would be possible without open-source projects and open-source free models and original owners of the data on which these models were trained.

I will update this post if and when I use new tools / models.

(0) https://news.ycombinator.com/item?id=42537024

(1) https://www.reddit.com/r/LocalLLaMA/

(2) https://ollama.com/blog

(3) https://ollama.com/download

(4) https://github.com/ggerganov/llama.cpp

(5) https://hub.docker.com/r/ollama/ollama

(6) https://github.com/open-webui/open-webui

(7) https://github.com/Mozilla-Ocho/llamafile

(8) https://github.com/Mozilla-Ocho/llamafile/issues/611

(9) https://github.com/AUTOMATIC1111/stable-diffusion-webui

(10) https://github.com/llyasviel/Fooocus

(11) https://github.com/comfyanonymous/ComfyUI

(12) https://docs.continue.dev/getting-started/overview

(13) https://github.com/brianpetro/obsidian-smart-connections

(14) https://obsidian.md

(15) https://ollama.com/search

(16) https://civitai.com/models/63376/isometric-chinese-style-architecture-lora

(17) https://containrrr.dev/watchtower/

(18) https://en.wikipedia.org/wiki/Raptor_Lake#Instability_and_degradation_issue

I try to write low frequency, High quality content on Health, Product Development, Programming, Software Engineering, DIY, Security, Philosophy and other interests. If you would like to receive them in your email inbox then please consider subscribing to mine Newsletter.


https://abishekmuthian.com/images/obsidian-smart-connections.jpg

2024-12-29 10:49:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button