Recurrent Gemma

Model overview

Recurrent Gemma architecture developed by Google is a hybrid model architecture.

Recurrent Gemma leverage both attention and state space model.

At the moment (16/08/2024), Recurrent Gemma have 1 major version.

The major version is RecurrentGemma.

RecurrentGemma models support quantization.

Quantization Technique	KV cache	Support	Updated
FP16	FP16	Yes	Yes
FP8	FP16	No	No
FP8	FP8	No	No
int8_weight_only	FP16	No	No
int4_weight_only	FP16	No	No
w4a16_awq	FP16	No	No
w4a8_awq	FP16	No	No

Model version	Model	Context size
RecurrentGemma	recurrentgemma-2b	4,096
RecurrentGemma	recurrentgemma-9b	4,096