What Hardware Do I Need to Run Local LLMs?
The honest answer: it depends on which models you want to run. But I’ll give you specific recommendations at every budget, so you can stop guessing.
The single most important spec is memory — RAM on a Mac, VRAM on a PC with a GPU. Everything else is secondary.
The Rule of Thumb
For quantized models (the standard format for local AI), you need roughly 1 GB of memory per 1 billion parameters, plus a few GB of overhead for the system.
| Model Size | Memory Needed | Example Models |
|---|---|---|
| 4-8B parameters | 8-12 GB | Gemma 4 e4b, Phi-4 Mini, Llama 3.2 8B |
| 12-14B parameters | 12-16 GB | Qwen 2.5 14B, Gemma 2 12B |
| 26-32B parameters | 20-24 GB | Gemma 4 26B, Qwen 3 Coder 30B |
| 70B parameters | 40-48 GB | Llama 4 Maverick, DeepSeek R1 70B |
| 100B+ parameters | 64-128 GB | Llama 4 Scout (full), Qwen 2.5 72B (high quant) |
These are for quantized (compressed) models, which is what Ollama uses by default. Full-precision models need roughly double the memory, but there’s almost never a reason to run those locally.
Recommended Hardware by Budget
Budget Tier: $700-1,200
Best for: 8-14B models, everyday AI tasks
Mac option:
- Mac Mini M2 with 24 GB unified memory (~$900 new, ~$700 refurbished)
- Runs: Gemma 4 e4b, Llama 3.2 8B, Phi-4 Mini comfortably
- Silent, tiny footprint, extremely power-efficient (~15-30W under load)
PC option:
- Any desktop + used NVIDIA RTX 3090 24 GB (~$700-800 for the GPU)
- Runs: Same models as above, plus some 14B models
- Louder and more power-hungry, but raw VRAM speed is excellent
What you can do: Quick answers, email drafting, summarization, simple coding help, basic analysis. Comparable to using ChatGPT 3.5 for most tasks.
Mid-Range Tier: $1,500-3,000
Best for: 26-32B models, serious daily use
Mac option:
- MacBook Pro M3 Pro with 36 GB (
$2,200) or Mac Mini M4 Pro with 48 GB ($1,800) - Runs: Gemma 4 26B, Qwen 3 Coder 30B — the sweet spot models
- Portable (laptop) or compact (Mini), great power efficiency
PC option:
- Desktop + NVIDIA RTX 4090 48 GB (~$1,800-2,000 for the GPU)
- Runs: All 30B models with room to spare, some 70B at lower quant
- Fastest inference speed at this tier
What you can do: Complex writing, data analysis, code generation, multi-step reasoning, content pipelines. This tier handles 80-90% of what most people use cloud AI for.
Power Tier: $4,000-8,000
Best for: 70B+ models, running multiple models simultaneously, production workloads
Mac option:
- Mac Studio M3 Ultra with 192-256 GB unified memory ($5,000-8,000)
- Runs: Everything — multiple large models pinned simultaneously
- Dead silent, incredibly power-efficient for the capability
PC option:
- Dual RTX 4090 or RTX 5090 setup ($4,000-6,000 for GPUs)
- Runs: 70B models at high speed, some 100B+ models
- Requires proper cooling and PSU; not quiet
What you can do: Run a full AI stack — multiple models loaded for different tasks, automated pipelines, serve AI to other devices on your network. This is “replace most cloud AI” territory.
Mac vs PC for Local AI
This comes up constantly, so here’s the direct comparison:
| Factor | Mac (Apple Silicon) | PC (NVIDIA GPU) |
|---|---|---|
| Memory architecture | Unified — all RAM available to GPU | Split — VRAM is the bottleneck |
| Max memory | 256 GB (M3/M4 Ultra) | 48 GB per GPU (can multi-GPU) |
| Token speed | Good — 20-40 tok/s on 30B models | Faster — 40-80 tok/s on same models |
| Power draw | 30-100W typical | 300-600W under load |
| Noise | Silent to near-silent | Moderate to loud |
| Setup difficulty | Easy — Ollama just works | Moderate — driver dependencies |
| Cost per GB of usable memory | ~$25-30/GB | ~$40-50/GB (VRAM) |
| Best for | Large models, always-on, efficiency | Speed, gaming crossover, budget VRAM |
My take: Mac wins for most people. Unified memory means a 64 GB Mac can load models that would require a $2,000+ GPU on PC. The power efficiency means you can leave it running 24/7 without worrying about your electric bill. Setup is genuinely easier.
PC wins if you need maximum tokens-per-second or already have a gaming rig with a good GPU.
How I Actually Do This
My setup: Mac Studio M3 Ultra, 256 GB unified memory.
I keep 4 models pinned in memory simultaneously (~57 GB):
- Gemma 4 e4b (10 GB) — fast tasks
- Gemma 4 26B (17 GB) — daily workhorse
- Qwen 3 Coder 30B (18 GB) — code and tool calling
- Gemma 4 31B (19 GB) — complex reasoning
That leaves ~190 GB free for the system, apps, and loading additional models on demand. The Mac Studio draws about 60-100W under AI load and is completely silent.
What I’d buy if starting over today with each budget:
| Budget | I’d Buy | Why |
|---|---|---|
| $1,000 | Mac Mini M4 with 32 GB | Best value for local AI. Runs 26B models. Silent. |
| $2,000 | Mac Mini M4 Pro with 48 GB | Comfortable 30B+ models, room for multitasking |
| $4,000 | Mac Studio M4 Max with 128 GB | Run 70B models, multiple models loaded |
| $7,000+ | Mac Studio M4 Ultra with 256 GB | Run everything. My current setup is this tier. |
The Mac Mini at $1,000 is the real story here. That’s the price of 10 months of ChatGPT Plus, and it runs capable models for free after that — forever.
What NOT to Buy
- Laptops with 8 GB RAM — can technically run tiny models, but the experience is frustrating
- Older Intel Macs — no unified memory, no Metal GPU acceleration for AI, painfully slow
- AMD GPUs for AI — software support lags far behind NVIDIA; ROCm is getting better but still not plug-and-play
- Cloud GPU rentals for daily use — makes sense for training, not for inference; you’ll spend more than buying hardware within a few months
Frequently Asked Questions
How much RAM do I need?
16 GB minimum for small models. 32-64 GB for the sweet spot (26-32B models that handle most tasks). 128 GB+ only if you want to run 70B models or keep multiple models loaded simultaneously.
Can I upgrade my RAM later?
On Mac: no. Apple Silicon memory is soldered — buy the amount you need upfront. On PC: system RAM is upgradeable, but GPU VRAM is not. Buy the GPU you want from the start.
Do I need fast storage (SSD)?
Yes — models are loaded from disk into memory. An NVMe SSD loads a 17 GB model in seconds. A spinning hard drive would take minutes. Every modern Mac has fast storage. On PC, make sure your models are on an SSD.
Can I use an external GPU (eGPU)?
On Mac: Apple dropped eGPU support with Apple Silicon. Not an option. On PC: technically possible via Thunderbolt, but the bandwidth penalty makes it much slower than an internal GPU. Not recommended.
Is 16 GB enough to get started?
Yes — 16 GB runs 8B models comfortably, and those models are surprisingly capable for everyday tasks. Start there, see if local AI fits your workflow, then upgrade if you want more.
This is part of the ASTGL Definitive Answers series — structured, practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.
Get the full Definitive Answers series
Practical answers to the questions people actually ask about AI automation, MCP servers, and local AI infrastructure.
Subscribe on Substack