ASTGL Definitive Answers

What Hardware Do I Need to Run Local LLMs?

James Cruce

The honest answer: it depends on which models you want to run. But I’ll give you specific recommendations at every budget, so you can stop guessing.

The single most important spec is memory — RAM on a Mac, VRAM on a PC with a GPU. Everything else is secondary.

The Rule of Thumb

For quantized models (the standard format for local AI), you need roughly 1 GB of memory per 1 billion parameters, plus a few GB of overhead for the system.

Model SizeMemory NeededExample Models
4-8B parameters8-12 GBGemma 4 e4b, Phi-4 Mini, Llama 3.2 8B
12-14B parameters12-16 GBQwen 2.5 14B, Gemma 2 12B
26-32B parameters20-24 GBGemma 4 26B, Qwen 3 Coder 30B
70B parameters40-48 GBLlama 4 Maverick, DeepSeek R1 70B
100B+ parameters64-128 GBLlama 4 Scout (full), Qwen 2.5 72B (high quant)

These are for quantized (compressed) models, which is what Ollama uses by default. Full-precision models need roughly double the memory, but there’s almost never a reason to run those locally.

Budget Tier: $700-1,200

Best for: 8-14B models, everyday AI tasks

Mac option:

  • Mac Mini M2 with 24 GB unified memory (~$900 new, ~$700 refurbished)
  • Runs: Gemma 4 e4b, Llama 3.2 8B, Phi-4 Mini comfortably
  • Silent, tiny footprint, extremely power-efficient (~15-30W under load)

PC option:

  • Any desktop + used NVIDIA RTX 3090 24 GB (~$700-800 for the GPU)
  • Runs: Same models as above, plus some 14B models
  • Louder and more power-hungry, but raw VRAM speed is excellent

What you can do: Quick answers, email drafting, summarization, simple coding help, basic analysis. Comparable to using ChatGPT 3.5 for most tasks.

Mid-Range Tier: $1,500-3,000

Best for: 26-32B models, serious daily use

Mac option:

  • MacBook Pro M3 Pro with 36 GB ($2,200) or Mac Mini M4 Pro with 48 GB ($1,800)
  • Runs: Gemma 4 26B, Qwen 3 Coder 30B — the sweet spot models
  • Portable (laptop) or compact (Mini), great power efficiency

PC option:

  • Desktop + NVIDIA RTX 4090 48 GB (~$1,800-2,000 for the GPU)
  • Runs: All 30B models with room to spare, some 70B at lower quant
  • Fastest inference speed at this tier

What you can do: Complex writing, data analysis, code generation, multi-step reasoning, content pipelines. This tier handles 80-90% of what most people use cloud AI for.

Power Tier: $4,000-8,000

Best for: 70B+ models, running multiple models simultaneously, production workloads

Mac option:

  • Mac Studio M3 Ultra with 192-256 GB unified memory ($5,000-8,000)
  • Runs: Everything — multiple large models pinned simultaneously
  • Dead silent, incredibly power-efficient for the capability

PC option:

  • Dual RTX 4090 or RTX 5090 setup ($4,000-6,000 for GPUs)
  • Runs: 70B models at high speed, some 100B+ models
  • Requires proper cooling and PSU; not quiet

What you can do: Run a full AI stack — multiple models loaded for different tasks, automated pipelines, serve AI to other devices on your network. This is “replace most cloud AI” territory.

Mac vs PC for Local AI

This comes up constantly, so here’s the direct comparison:

FactorMac (Apple Silicon)PC (NVIDIA GPU)
Memory architectureUnified — all RAM available to GPUSplit — VRAM is the bottleneck
Max memory256 GB (M3/M4 Ultra)48 GB per GPU (can multi-GPU)
Token speedGood — 20-40 tok/s on 30B modelsFaster — 40-80 tok/s on same models
Power draw30-100W typical300-600W under load
NoiseSilent to near-silentModerate to loud
Setup difficultyEasy — Ollama just worksModerate — driver dependencies
Cost per GB of usable memory~$25-30/GB~$40-50/GB (VRAM)
Best forLarge models, always-on, efficiencySpeed, gaming crossover, budget VRAM

My take: Mac wins for most people. Unified memory means a 64 GB Mac can load models that would require a $2,000+ GPU on PC. The power efficiency means you can leave it running 24/7 without worrying about your electric bill. Setup is genuinely easier.

PC wins if you need maximum tokens-per-second or already have a gaming rig with a good GPU.

How I Actually Do This

My setup: Mac Studio M3 Ultra, 256 GB unified memory.

I keep 4 models pinned in memory simultaneously (~57 GB):

  • Gemma 4 e4b (10 GB) — fast tasks
  • Gemma 4 26B (17 GB) — daily workhorse
  • Qwen 3 Coder 30B (18 GB) — code and tool calling
  • Gemma 4 31B (19 GB) — complex reasoning

That leaves ~190 GB free for the system, apps, and loading additional models on demand. The Mac Studio draws about 60-100W under AI load and is completely silent.

What I’d buy if starting over today with each budget:

BudgetI’d BuyWhy
$1,000Mac Mini M4 with 32 GBBest value for local AI. Runs 26B models. Silent.
$2,000Mac Mini M4 Pro with 48 GBComfortable 30B+ models, room for multitasking
$4,000Mac Studio M4 Max with 128 GBRun 70B models, multiple models loaded
$7,000+Mac Studio M4 Ultra with 256 GBRun everything. My current setup is this tier.

The Mac Mini at $1,000 is the real story here. That’s the price of 10 months of ChatGPT Plus, and it runs capable models for free after that — forever.

What NOT to Buy

  • Laptops with 8 GB RAM — can technically run tiny models, but the experience is frustrating
  • Older Intel Macs — no unified memory, no Metal GPU acceleration for AI, painfully slow
  • AMD GPUs for AI — software support lags far behind NVIDIA; ROCm is getting better but still not plug-and-play
  • Cloud GPU rentals for daily use — makes sense for training, not for inference; you’ll spend more than buying hardware within a few months

Frequently Asked Questions

How much RAM do I need?

16 GB minimum for small models. 32-64 GB for the sweet spot (26-32B models that handle most tasks). 128 GB+ only if you want to run 70B models or keep multiple models loaded simultaneously.

Can I upgrade my RAM later?

On Mac: no. Apple Silicon memory is soldered — buy the amount you need upfront. On PC: system RAM is upgradeable, but GPU VRAM is not. Buy the GPU you want from the start.

Do I need fast storage (SSD)?

Yes — models are loaded from disk into memory. An NVMe SSD loads a 17 GB model in seconds. A spinning hard drive would take minutes. Every modern Mac has fast storage. On PC, make sure your models are on an SSD.

Can I use an external GPU (eGPU)?

On Mac: Apple dropped eGPU support with Apple Silicon. Not an option. On PC: technically possible via Thunderbolt, but the bandwidth penalty makes it much slower than an internal GPU. Not recommended.

Is 16 GB enough to get started?

Yes — 16 GB runs 8B models comfortably, and those models are surprisingly capable for everyday tasks. Start there, see if local AI fits your workflow, then upgrade if you want more.


This is part of the ASTGL Definitive Answers series — structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.

Get the full Definitive Answers series

Practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.

Subscribe on Substack