Gemma

Model overview

Gemma architecture developed by Google is a dense model architecture.

At the moment (16/08/2024), Recurrent Gemma have 2 major version.

The major version are Gemma1 and Gemma2.

Gemma models support quantization.

Quantization Technique	KV cache	Support	Updated
FP16	FP16	Yes	Yes
FP8	FP16	Yes	No
FP8	FP8	Yes	No
int8_weight_only	FP16	Yes	Yes
int4_weight_only	FP16	Yes	Yes
w4a16_awq	FP16	No	No
w4a8_awq	FP16	No	No