AI Systems Engineering

The full stack of
production AI

136 topics across inference, optimization, agents, evaluation, and governance — written for engineers building and operating AI systems.

Tokenization, attention, KV cache, batching, speculative decoding

Prompting techniques, sampling parameters, structured output

vLLM, TGI, TensorRT-LLM, parallelism strategies, serving metrics

Quantization, LoRA/QLoRA, fine-tuning, RLHF, DPO, distillation

RAG, vector databases, chunking, hybrid search, GraphRAG

Agents, MCP, LangGraph, multi-agent systems, tool calling

Guardrails, PII redaction, red-teaming, EU AI Act, NIST RMF

RAGAS, LLM-as-judge, golden datasets, CI/CD eval gates

LLMOps, tracing, drift detection, cost tracking, Langfuse

AI gateways, routing, streaming, cloud platforms, hybrid deployment

Architecture overview

How all layers connect in a production AI system

The full stack ofproduction AI