Blog
For shorter decision guides, use Fast Answers. The archive below is for the long-form articles.
-
AI Agent Memory: Schema-Guided Typed State for Long-Running Systems
Agentic AI
Agents 101
Why production AI agents need schema-guided memory with temporal validity, provenance, and structured conflict handling instead of raw text vector recall.
-
Evaluating AI Agents in Production: From Traces to Test Suites
Agentic AI
Agents 101
How to evaluate production AI agents by turning traces into versioned regression datasets, trajectory metrics, calibrated judges, and CI gates.
-
Long-Running AI Agent Runtime in 2026: Sessions, Sandboxes, Checkpoints, and Harnesses
Agents 101
How to run long-running AI agents in production with sessions, harnesses, sandboxes, checkpoints, traces, and deployment patterns that recover cleanly.
-
Evaluating RAG: Metrics for Every Stage of a Production RAG System
AI Engineering
RAG evaluation metrics for parsing, retrieval, reranking, generation, citations, and production telemetry, with code and failure modes.
-
AI Agent Security in 2026: Guardrails, Permissions, Sandboxes, and MCP Threats
Agents 101
Why LLM guardrails are not enough for AI agent security, and where permissions, sandboxes, HITL, MCP scoping, and policy checks fit.
-
The Definitive Guide to NER in 2026: Encoders, LLMs, and the 3-Tier Production Architecture
AI Engineering
NER in 2026 means choosing between GLiNER, spaCy, Transformers, and LLM extraction for latency, accuracy, and schema control.
-
AI Agent Tool Use in 2026: MCP, CLI, Skills, and Code Execution
Agents 101
Compare JSON tool calling, MCP, Skills, CLI, and code execution for AI agents, with ACI design rules and the production trade-offs that matter.
-
LLM Engineering Guide: 45 Concepts for Inference, Training, Architecture, and Operations
AI Engineering
A practitioner's reference to 45 LLM engineering concepts for production systems, spanning inference, training, architecture, deployment, and operations.
-
The Definitive Guide to OCR in 2026: From Pipelines to VLMs
AI Engineering
OCR in 2026 means choosing between classical pipelines and VLMs for text, layout, tables, and document extraction.
-
Modern Data Processing Engines Compared: Polars, DataFusion, Daft, Ray Data, Pandas, and Spark
Infrastructure
Benchmarks comparing Polars, DataFusion, Daft, Ray Data, Pandas, and Spark on tabular and multimodal workloads, with code and decision rules.
-
AI Agent Memory Architecture in 2026: Checkpoints, Vector Stores, and File-Based Memory
Agents 101
How to design AI agent memory with checkpoints, PostgreSQL or Redis, vector stores such as Qdrant, and file-based memory for long-running systems.
-
Search Ranking Stack in 2026: BM25, Embeddings, Cross-Encoders, and LLM Reranking
Search and Recs
How to build a search ranking stack with BM25, dense embeddings, hybrid RRF, cross-encoder reranking, and LLM listwise reranking on Amazon ESCI.
-
AI Agent Reasoning Loops in 2026: ReAct vs ReWOO vs Plan-and-Execute
Agents 101
Compare ReAct, ReWOO, and Plan-and-Execute for AI agents. LangGraph examples show when each loop wins on cost, latency, and task shape.
-
Manifold-Constrained Hyper-Connections (mHC): DeepSeek Residual Scaling Explained
Paper Review
A technical walkthrough of DeepSeek's Manifold-Constrained Hyper-Connections (mHC), residual stream width scaling, Sinkhorn routing, and training stability.
-
Enterprise RAG Challenge 3 (ECR3): Winning AI Agent Architectures
Agentic AI
What won Enterprise RAG Challenge 3: multi-agent pipelines, evolutionary prompt engineering, guardrails, context strategy, and autonomous AI agent design.
-
LLM Fine-Tuning Guide: LoRA, QLoRA, DoRA, Unsloth, Axolotl, and Deployment
AI Engineering
When to fine-tune LLMs, when to use RAG or prompting, and how LoRA, QLoRA, DoRA, Unsloth, Axolotl, datasets, evals, and deployment fit.
-
Schema-Guided Reasoning on vLLM: Structured Outputs with xgrammar and Pydantic
Agentic AI
How Schema-Guided Reasoning uses vLLM, xgrammar, Pydantic schemas, and constrained decoding to make LLM outputs structured and reliable.
-
LoRAX Serving Guide: Thousands of LoRA Adapters on Kubernetes
AI Engineering
How to serve thousands of LoRA adapters with LoRAX on Kubernetes: dynamic adapter loading, multi-adapter batching, memory tiers, Helm, and APIs.
-
Domain-Driven Design for AI Agents: Bounded Contexts, Tools, and Business Rules
Agentic AI
How domain-driven design helps AI agents model business rules with ubiquitous language, bounded contexts, entities, tools, repositories, and orchestration.
-
Context Engineering for AI Agents: Context Windows, Memory, Tools, and Guardrails
Agentic AI
How to design context engineering for AI agents: context windows, instruction hierarchy, retrieval, memory, tool definitions, guardrails, and compression.
-
MCP Server Tutorial with uv and FastMCP: Build a FeatureStoreLite Server
Tooling
Build a custom MCP server with uv and FastMCP, expose ML feature-store tools, test them locally, and connect the server to Claude Desktop.
-
Open-Source LLM Variants and File Formats: Instruct, GGUF, GPTQ, AWQ, and MoE
AI Engineering
How to choose open-source LLM variants and file formats: Base vs Instruct vs Distill, GGUF vs GPTQ vs AWQ, quantization, MoE, and hardware fit.
-
Local LLMs on macOS: Ollama, LM Studio, llama.cpp, MLX, and Apple Silicon
AI Engineering
Compare local LLM tools on macOS: Ollama, LM Studio, llama.cpp, MLX, and Apple Silicon trade-offs for private, low-latency inference.
-
Stable Diffusion on macOS: Draw Things, DiffusionBee, ComfyUI, A1111, and Fooocus
Tooling
Compare local Stable Diffusion tools on macOS: Draw Things, DiffusionBee, ComfyUI, AUTOMATIC1111, and Fooocus on Apple Silicon.
-
`pyproject.toml` Guide: Python Packaging, Dependencies, and Tool Configuration
Python
How `pyproject.toml` works for Python packaging, build systems, project metadata, dependencies, CLI entry points, and tool configuration.
-
Zsh Startup Files: `~/.zprofile` vs `~/.zshrc` on macOS and Linux
Tooling
Zsh startup files explained: when `~/.zprofile` and `~/.zshrc` load on macOS and Linux, and what belongs in each file.
-
MLOps vs LLMOps: Infrastructure for Foundation Models and LLM Systems
Infrastructure
How MLOps changed after foundation models: model serving, RAG pipelines, vector databases, fine-tuning, evaluation, monitoring, and LLMOps infrastructure.
-
Scaling Large Language Models: Multi-GPU and Multi-Node Strategies That Hold Up in Practice
AI Engineering
How to scale large language models across multiple GPUs and nodes with data parallelism, FSDP, tensor parallelism, pipeline parallelism, and context parallelism.
-
MacBook Setup for AI Engineering: macOS Tools, Python, Docker, and Terminal Workflow
Tooling
A MacBook setup checklist for AI engineering: Xcode tools, Homebrew, Python with uv and pyenv, Docker, terminal setup, and VS Code.
-
uv on macOS: Managing Python Versions, Projects, and Tools
Python
How to use uv on macOS for Python installs, virtual environments, project dependencies, lock files, and one-off CLI tools.