Mamba

-> Go to Benchmark

Model overview

Mamba architecture developed by Tri Dao and Albert Gu is a state space model architecture.

Mamba had replace attention layer to be a state space model better scale for long context.

At the moment (16/08/2024), Mamba have 2 major version.

The major version are Mamba1 and Mamba2.

Model details

Huggingface model

Support quantization

Mamba models support quantization.

Quantization TechniqueKV cacheSupportUpdated
FP16FP16YesYes
FP8NoneNoNo
FP8NoneNoNo
int8_weight_onlyNoneNoNo
int4_weight_onlyNoneNoNo
w4a16_awqNoneNoNo
w4a8_awqNoneNoNo

Model versions

Model versionModelContext size
MambaMamba-130m2,048
MambaMamba-370m2,048
MambaMamba-790m2,048
MambaMamba-1.4b2,048
MambaMamba-2.8b2,048
Mamba2Mamba2-130m2,048
Mamba2Mamba2-370m2,048
Mamba2Mamba2-790m2,048
Mamba2Mamba2-1.4b2,048
Mamba2Mamba2-2.8b2,048