ENG | Running large language models locally.

Running AI models like DeepSeek-R1 locally using Ollama and exploring their practical limitations.

Posted Jan 29, 2025 Updated Jun 2, 2025

Image of Llama (with four legs!) generated by GWEN AI

By Pavel Peřina

6 min read

ENG | Running large language models locally.

Motivation

This article is based on my brief experiments with running AI locally and exploring its practical limitations. I wanted to avoid subscription fees during weeks with heavier AI use.

Just now there’s a hype about DeepSeek-R1 which is supposed to beat ChatGPT-4 and Claude 3.5 Sonnet, and the best part is that it’s free to download and you can run it on your own hardware… or can you?

Before you start: Reality check

Have realistic expectations when downloading model. Models around 14B to 16B parameters are the practical maximum for a 12GB GPU. These will run at decent speeds even on CPU - I tested on an old i5-6500T and it was usable. Moving up to 30B models changes everything - they run about four times slower and need roughly 23GB of RAM. Even on a Ryzen 5900X, the speed isn’t practical for regular use. Athough this varies by specific model - some are faster than others.
TL;DR: DeepSeek-R1:671b is out of question for your pathetic PC.

Windows

Get Ollama - rougly 800MB download - and install it.

Open new command prompt and type ollama to see the available commands

PS C:\Users\pavel> ollama
Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

Pull and run some model, start with something small for testing

PS C:\Users\pavel> ollama run deepseek-r1:8b
pulling manifest
pulling 6340dc3229b0...  87% ▕████████████████████████████████████████████████        ▏ 4.3 GB/4.9 GB   61 MB/s     10s

One minute later later:

pulling manifest
pulling 6340dc3229b0... 100% ▕████████████████████████████████████████████████████████▏ 4.9 GB
pulling 369ca498f347... 100% ▕████████████████████████████████████████████████████████▏  387 B
pulling 6e4c38e1172f... 100% ▕████████████████████████████████████████████████████████▏ 1.1 KB
pulling f4d24e9138dd... 100% ▕████████████████████████████████████████████████████████▏  148 B
pulling 0cb05c6e4e02... 100% ▕████████████████████████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
success

Interact with model Once the model is running, you can interact with it using the provided commands

>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message.

>>> Hello
<think>

</think>

Hello! How can I assist you today? 😊

Linux

On linux there’s a tarball you can download. It contains bin/ollama binary and CUDA libraries in a directory lib/ollama/*.

I created new user named ollama and unpacked it in theirs home directory. Then it’s needed to run ~/bin/ollama serve & to run a server in background and then ~/bin/ollama run codestral:22b or something like this.

I chose more complicated way than running provided shell script and have server running all the time in the background.

You can use systemd file that are provided in official manual install instructions

Conclusion

Running AI models locally is impractical on PCs that are made for us, mortals. It’s not common to have GPU with 32GB RAM or so, even less GPU with 512GB RAM or more 😢. Yes, there are small models, but…

The real problem with small models goes beyond just memory constraints. They suffer from severe identity crisis and lack of expertise, which can be sometimes fun, but mostly frustrating. One moment the AI presents itself as a personal assistant that can’t code but claims it can play music, arrange appointments, and set notifications (which it can’t do in reality). After resetting the session, it suddenly “forgets” its limitations and writes code full of duplicate arrays and unused variables. When you point out these issues, it can barely guess what its own program does and makes vague speculations about variables “possibly being for future use.” Then a few paragraphs later, it completely forgets which code you were even discussing.

Side notes

The Economics of AI Cloud Services

What makes this whole situation even more interesting is the return on investment calculation for cloud-based AIs. NVidia makes boards with 1.5TB of RAM, 6kW power consumption and $400,000 price. But to repay only the hardware in 3 years - no electricity for server, cooling, maintenance, R&D, support staff, or redundancy systems - it needs to generate income of $14,000 per month. This means 700 users paying $20 monthly subscription fees, where each user can only utilize the server for about two minutes a day or one hour a month. When you look at these numbers, the current pricing of cloud AI services starts making more sense, even if it feels expensive for individual users.

DeepSeek and Media Double Standards

In late January 2025 Deepseek-R1 had intereseting effect. First it was called as breakthrough technology as it is claimed that it was trained at fraction of cost of other models and it also surpased them in benchmarks. Then there were various articles spreading how unethical Deepseek is, because it was trained on outputs from ChatGPT, Claude.AI and alike, how it refuses to answer topics which are sensitive to China, how it steals your data.

Very hypocritical was a stance of OpenAI which accused DeepSeek for using their outputs to train their own model which is against terms of service. Two or three years before, it was OpenAI scraping content all over the internet, copyrighted or not just to achieve technological leadership in the name of innovation and american dominance. Not to mention that DeepSeek models and some papers are public whereas there’s nothing open about open AI.

Nonetheless, media coverage clearly shows double standard and high level of bias.

Addendum: QWEN Models (2025-02-02)

qwen2.5-coder:32b is actually quite good model from Alibaba, especially comparing to Codestral which is not far from answering that it’s a toaster which can’t code. Full cloud model also seem pretty capable.

Week after release of Deepseek-R1, OpenAI released new O3-mini model, QWEN released Qwen-2.5-Max so suddenly there are more options.

Nonetheless, at this time they seem similar in capabilities, nothing stands out and personal experience could be affected by randomization of output.

Addendum: Deleting models (2025-06-02)

Once it gets boring, you can a lot of disc space:

PS C:\Users\pavel> ollama list
NAME                 ID              SIZE     MODIFIED
qwen2.5-coder:32b    4bd6cbf2d094    19 GB    4 months ago
deepseek-r1:32b      38056bbcbb2d    19 GB    4 months ago
PS C:\Users\pavel> ollama rm qwen2.5-coder:32b deepseek-r1:32b
deleted 'qwen2.5-coder:32b'
deleted 'deepseek-r1:32b'
PS C:\Users\pavel> ollama list
NAME    ID    SIZE    MODIFIED

And then uninstalling ollama from Windows Apps & features saves another 5GB.

Resources

ollama.com - Download for Windows, Linux, manual setup guide.
Fireship -=- I built a DeepSeek R1 powered VS Code extension - youtube video how to install Ollama (and more)
Fireship -=- DeepSeek stole our tech… says OpenAI - youtube video about DeepSeek-R1
NVIDIA Umbriel B200 Baseboard 1.5TB HBM3e - GPU for largest AI models.
ServeTheHome -=- Inside the SUPER NVIDIA H200 Server From Supermicro - youtube video about GPU server with board similar to above, quite fascinating.

Quick links

English

self hosting

This post is licensed under CC BY 4.0 by the author.