Skip to main content

04 · Model Optimization & Formats

Making models smaller and faster. Study section 01 first — you need to understand what a model is before learning how to shrink or adapt it.

StepTopicOne-linerStatus
1QuantizationReducing weight precision from FP32 to INT8/INT4 to shrink model size and speed up inference🔴
2FP8 / INT8 / INT4The numeric formats used in quantization and what each trades off🔴
3GPTQ, AWQ & GGUFPractical quantization algorithms and the file formats they produce🔴
4Pruning & SparsityRemoving weights that contribute little to model output🔴
5Knowledge DistillationTraining a small model to mimic the output distribution of a large one🔴
6LoRA & QLoRALow-rank adaptation — fine-tuning with a fraction of the parameters🔴
7Adapter layersSmall trainable modules inserted into a frozen base model🔴
8Fine-tuning & SFTSupervised fine-tuning on task-specific data🔴
9RLHFReinforcement learning from human feedback — how alignment training works🔴
10DPO & GRPODirect preference optimization and group relative policy optimization — RLHF without a reward model🔴
11Model mergingCombining weights from multiple fine-tuned models into one🔴

← Previous section: 03 · Serving Infrastructure | Next section → 05 · Retrieval & Memory