sLLM (small Large Language Model) or SLM (Small Language Model)
0. LLM & SLM
LLM: parameter, above 8B
- Transform (Vanilla Transformer Model: encoder + decoder)
- encoder only
- decoder only
SLM: parameter, below 8B
1. main Foundation Model
- Llama3 8B - Meta
- Mistral 7B - Mistral AI
- Gemma - Google
- Phi-3 - Microsoft
2. Techniques
- Pruning
- Quantization: FP32 -> FP16 -> INT8 -> 4 bit ...
- Knowledge Distillation: Teacher Model & Student Model
- Lighten Model Structure: multi-query attention. MoE (Mixture of Experts)
3. HW
- CPU
- GPU
- NPU
- Memory
No comments:
Post a Comment