Gemma

-> Go to Benchmark

Model overview

Gemma architecture developed by Google is a dense model architecture.

At the moment (16/08/2024), Recurrent Gemma have 2 major version.

The major version are Gemma1 and Gemma2.

Model details

Huggingface model

Support quantization

Gemma models support quantization.

Quantization TechniqueKV cacheSupportUpdated
FP16FP16YesYes
FP8FP16YesNo
FP8FP8YesNo
int8_weight_onlyFP16YesYes
int4_weight_onlyFP16YesYes
w4a16_awqFP16NoNo
w4a8_awqFP16NoNo

Model versions

Model versionModelContext size
Gemmagemma-2b8,192
Gemmagemma-7b8,192
Gemma2gemma2-2b8,192
Gemma2gemma2-9b8,192
Gemma2gemma2-27b8,192