Saturday, August 3, 2024

On Device AI - sLLM (or SLM)

sLLM (small Large Language Model) or SLM (Small Language Model)

0. LLM & SLM

LLM: parameter, above 8B

  •  Transform (Vanilla Transformer Model: encoder + decoder)
  •  encoder only
  •  decoder only

SLM: parameter, below 8B


1. main Foundation Model

2. Techniques
  • Pruning
  • Quantization: FP32 -> FP16 -> INT8 -> 4 bit ...
  • Knowledge Distillation: Teacher Model & Student Model
  • Lighten Model Structure: multi-query attention. MoE (Mixture of Experts)
3. HW
  • CPU
  • GPU
  • NPU
  • Memory

No comments: