Quick-guide on Running LLMs Locally on macOS
Sending every prompt to a third-party API gets old, especially when half the prompts are "rewrite this paragraph" or "what's the JSON schema for this." Local LLMs solved that for me on Apple Silicon faster than I expected. A 7B model in 4-bit quantization runs comfortably on a 16 GB MacBook, and the round-trip stops at the keyboard.
So the open question is which app to drive it from. Ollama, LM Studio, llama.cpp, MLX, and a handful of others all wrap similar inference engines and the same GGUF files. They differ on how much friction sits between you and the model: at one end, double-click and type; at the other, compile from source and then read the man page.