04 · Model Optimization & Formats

Making models smaller and faster. Study section 01 first — you need to understand what a model is before learning how to shrink or adapt it.

Step	Topic	One-liner	Status
1	Quantization	Reducing weight precision from FP32 to INT8/INT4 to shrink model size and speed up inference	🔴
2	FP8 / INT8 / INT4	The numeric formats used in quantization and what each trades off	🔴
3	GPTQ, AWQ & GGUF	Practical quantization algorithms and the file formats they produce	🔴
4	Pruning & Sparsity	Removing weights that contribute little to model output	🔴
5	Knowledge Distillation	Training a small model to mimic the output distribution of a large one	🔴
6	LoRA & QLoRA	Low-rank adaptation — fine-tuning with a fraction of the parameters	🔴
7	Adapter layers	Small trainable modules inserted into a frozen base model	🔴
8	Fine-tuning & SFT	Supervised fine-tuning on task-specific data	🔴
9	RLHF	Reinforcement learning from human feedback — how alignment training works	🔴
10	DPO & GRPO	Direct preference optimization and group relative policy optimization — RLHF without a reward model	🔴
11	Model merging	Combining weights from multiple fine-tuned models into one	🔴

← Previous section: 03 · Serving Infrastructure | Next section → 05 · Retrieval & Memory