Recurrent Gemma architecture developed by Google is a hybrid model architecture.
Recurrent Gemma leverage both attention and state space model.
At the moment (16/08/2024), Recurrent Gemma have 1 major version.
The major version is RecurrentGemma.
RecurrentGemma models support quantization.
| Quantization Technique | KV cache | Support | Updated |
|---|---|---|---|
| FP16 | FP16 | Yes | Yes |
| FP8 | FP16 | No | No |
| FP8 | FP8 | No | No |
| int8_weight_only | FP16 | No | No |
| int4_weight_only | FP16 | No | No |
| w4a16_awq | FP16 | No | No |
| w4a8_awq | FP16 | No | No |
| Model version | Model | Context size |
|---|---|---|
| RecurrentGemma | recurrentgemma-2b | 4,096 |
| RecurrentGemma | recurrentgemma-9b | 4,096 |