Shin, Jae Wook: On Device AI - sLLM (or SLM)

Saturday, August 3, 2024

On Device AI - sLLM (or SLM)

sLLM (small Large Language Model) or SLM (Small Language Model)

0. LLM & SLM

LLM: parameter, above 8B

Transform (Vanilla Transformer Model: encoder + decoder)
encoder only
decoder only

SLM: parameter, below 8B

1. main Foundation Model

Llama3 8B - Meta
Mistral 7B - Mistral AI
Gemma - Google
Phi-3 - Microsoft

2. Techniques

Pruning
Quantization: FP32 -> FP16 -> INT8 -> 4 bit ...
Knowledge Distillation: Teacher Model & Student Model
Lighten Model Structure: multi-query attention. MoE (Mixture of Experts)

3. HW

CPU
GPU
NPU
Memory

No comments:

Subscribe to: Post Comments (Atom)