Mamba

Model overview

Mamba architecture developed by Tri Dao and Albert Gu is a state space model architecture.

Mamba had replace attention layer to be a state space model better scale for long context.

At the moment (16/08/2024), Mamba have 2 major version.

The major version are Mamba1 and Mamba2.

Mamba models support quantization.

Quantization Technique	KV cache	Support	Updated
FP16	FP16	Yes	Yes
FP8	None	No	No
FP8	None	No	No
int8_weight_only	None	No	No
int4_weight_only	None	No	No
w4a16_awq	None	No	No
w4a8_awq	None	No	No