Mamba architecture developed by Tri Dao and Albert Gu is a state space model architecture.
Mamba had replace attention layer to be a state space model better scale for long context.
At the moment (16/08/2024), Mamba have 2 major version.
The major version are Mamba1 and Mamba2.
Mamba models support quantization.
Quantization Technique | KV cache | Support | Updated |
---|---|---|---|
FP16 | FP16 | Yes | Yes |
FP8 | None | No | No |
FP8 | None | No | No |
int8_weight_only | None | No | No |
int4_weight_only | None | No | No |
w4a16_awq | None | No | No |
w4a8_awq | None | No | No |
Model version | Model | Context size |
---|---|---|
Mamba | Mamba-130m | 2,048 |
Mamba | Mamba-370m | 2,048 |
Mamba | Mamba-790m | 2,048 |
Mamba | Mamba-1.4b | 2,048 |
Mamba | Mamba-2.8b | 2,048 |
Mamba2 | Mamba2-130m | 2,048 |
Mamba2 | Mamba2-370m | 2,048 |
Mamba2 | Mamba2-790m | 2,048 |
Mamba2 | Mamba2-1.4b | 2,048 |
Mamba2 | Mamba2-2.8b | 2,048 |