Skip to content

Blog

Building a Custom FeatureStoreLite MCP Server Using uv

A step-by-step guide that shows how to create your own lightweight feature store MCP server from scratch using FastMCP, run it through uv, and integrate it with Claude Desktop. This is a practical example of building a useful MCP server that ML engineers can actually use.

Choosing the Right Open-Source LLM Variant & File Format


1. Why all these tags exist

Open-source LLMs are shipped in two axes of variation:

  1. Training / fine-tuning style – the suffixes you see in model names (-Instruct, -Distill, -A3B, …) tell you how the checkpoint was produced and what it's good at.
  2. File & quantization format – the extension (.gguf, .gptq, …) tells you how the weights are packed for inference on different hardware.

Understanding both axes lets you avoid downloading 20 GB for nothing or fighting CUDA errors at 3 a.m.

Quick-Guide on pyproject.toml

TL;DR

Think of pyproject.toml as the package.json for Python. Whether you prefer .venv, pyenv, or uv, putting all your project's metadata, dependencies, and tooling into one tidy TOML file simplifies development and boosts collaboration.

Quick-Guide on ~/.zprofile vs ~/.zshrc 🚀

TL;DR ⚡

  • ~/.zprofile → one-shot, login-shell initialization (think "environment/bootstrap") 🔧
  • ~/.zshrc → every interactive prompt (think "daily driving experience") 🎮

Use both in tandem: keep your environment reliable with ~/.zprofile, and your shell pleasant and tweakable with ~/.zshrc

MLOps in the Age of Foundation Models. Evolving Infrastructure for LLMs and Beyond

The field of machine learning has undergone a seismic shift with the rise of large-scale foundation models - from giant language models (LLMs) like GPT-4 to image diffusion models like Stable Diffusion. As a result, the way we build and operate ML systems (MLOps) looks very different today than it did just a few years ago. In this post, we'll explore how ML infrastructure and MLOps practices have evolved - contrasting the "classic" era of MLOps with the modern paradigms emerging to support foundation models. We'll highlight what's changed, what new patterns and workflows have emerged.

Scaling Large Language Models - Practical Multi-GPU and Multi-Node Strategies for 2025

The race to build bigger, better language models continues at breakneck speed. Today's state-of-the-art models require massive computing resources that no single GPU can handle. Whether you're training a custom LLM or deploying one for inference, understanding how to distribute this workload is essential.

This guide walks through practical strategies for scaling LLMs across multiple GPUs and nodes, incorporating insights from Hugging Face's Ultra-Scale Playbook.

Quick-Guide on managing Python like an AI Engineer on macOS with uv

TL;DR Bash Cheat‑sheet

brew install uv        # install tool
uv python install 3.12 # grab interpreter

# New project workflow (modern)
uv init                # create new project with pyproject.toml
uv add pandas numpy    # add dependencies
uv run train.py        # run with correct interpreter

# Classical project workflow (requirements.txt)
uv venv                           # create .venv
uv pip install -r requirements.txt # install from requirements
uv run train.py                   # run script

brew upgrade uv         # update uv itself (Homebrew install)