Recurrent Gemma architecture developed by Google is a hybrid model architecture.
Recurrent Gemma leverage both attention and state space model.
At the moment (16/08/2024), Recurrent Gemma have 1 major version.
The major version is RecurrentGemma.
RecurrentGemma models support quantization.
Quantization Technique | KV cache | Support | Updated |
---|---|---|---|
FP16 | FP16 | Yes | Yes |
FP8 | FP16 | No | No |
FP8 | FP8 | No | No |
int8_weight_only | FP16 | No | No |
int4_weight_only | FP16 | No | No |
w4a16_awq | FP16 | No | No |
w4a8_awq | FP16 | No | No |
Model version | Model | Context size |
---|---|---|
RecurrentGemma | recurrentgemma-2b | 4,096 |
RecurrentGemma | recurrentgemma-9b | 4,096 |