Skip to content

Blog

Quick-guide on Running LLMs Locally on macOS

Sending every prompt to a third-party API gets old, especially when half the prompts are "rewrite this paragraph" or "what's the JSON schema for this." Local LLMs solved that for me on Apple Silicon faster than I expected. A 7B model in 4-bit quantization runs comfortably on a 16 GB MacBook, and the round-trip stops at the keyboard.

So the open question is which app to drive it from. Ollama, LM Studio, llama.cpp, MLX, and a handful of others all wrap similar inference engines and the same GGUF files. They differ on how much friction sits between you and the model: at one end, double-click and type; at the other, compile from source and then read the man page.

The Ultimate Guide to pyproject.toml

If you've ever opened a Python project and tried to figure out where dependencies, build settings, and tool configs actually live, you know the pain. setup.py, setup.cfg, requirements.txt, MANIFEST.in, plus a handful of dotfiles for every linter and formatter — all reading from different places.

pyproject.toml collapses most of that into one file.

TL;DR

pyproject.toml is roughly the package.json for Python. One file holds your project metadata, dependencies, and tool settings. Whether you're using .venv, pyenv, or uv, putting everything here makes setup and collaboration easier.

Mastering Zsh Startup: ~/.zprofile vs ~/.zshrc

If your terminal feels slow, or your environment variables aren't loading where you expect, you're probably running into Zsh's startup order.

The split between ~/.zprofile and ~/.zshrc is one of the most common sources of confusion when you move to Zsh, especially on macOS, where the defaults behave differently from Linux.

TL;DR

~/.zprofile is for environment setup. It runs once per login, which on macOS means once per terminal tab. Put your PATH, EDITOR, and version managers like fnm or pyenv there.

~/.zshrc is for interactive configuration. It runs every time you start a new shell. Put aliases, prompt themes, and key bindings there.

Scaling Large Language Models: Multi-GPU and Multi-Node Strategies That Hold Up in Practice

Today's LLMs don't fit on a single GPU. A 70B-parameter model needs about 140GB for weights alone in FP16, nearly twice what an A100 holds. Training or serving these models means splitting the work across multiple GPUs, and getting the split wrong wastes most of your compute budget.

This is a practical walk-through of the parallelism strategies that actually work in production, drawn from Hugging Face's Ultra-Scale Playbook.

Quick Guide: Managing Python on macOS with uv

Quick Start

# Install uv
brew install uv

# For new projects (modern workflow)
uv init                # create project structure
uv add pandas numpy    # add dependencies
uv run train.py        # run your script

# For existing projects (legacy workflow)
uv venv                             # create virtual environment
uv pip install -r requirements.txt  # install dependencies
uv run train.py                     # run your script

# Run tools without installing them
uvx ruff check .       # run linter
uvx black .            # run formatter