AI Systems Engineering
The full stack of
production AI
136 topics across inference, optimization, agents, evaluation, and governance — written for engineers building and operating AI systems.
Model Inference Core
Tokenization, attention, KV cache, batching, speculative decoding
17 topics02Prompting & Control
Prompting techniques, sampling parameters, structured output
9 topics03Serving Infrastructure
vLLM, TGI, TensorRT-LLM, parallelism strategies, serving metrics
15 topics04Model Optimization
Quantization, LoRA/QLoRA, fine-tuning, RLHF, DPO, distillation
11 topics05Retrieval & Memory
RAG, vector databases, chunking, hybrid search, GraphRAG
12 topics06Agents & Orchestration
Agents, MCP, LangGraph, multi-agent systems, tool calling
15 topics07Safety & Governance
Guardrails, PII redaction, red-teaming, EU AI Act, NIST RMF
15 topics08Evaluation & Quality
RAGAS, LLM-as-judge, golden datasets, CI/CD eval gates
13 topics09Observability & Ops
LLMOps, tracing, drift detection, cost tracking, Langfuse
14 topics10Integration & Cloud
AI gateways, routing, streaming, cloud platforms, hybrid deployment
15 topicsArchitecture overview
How all layers connect in a production AI system
