Gemma architecture developed by Google is a dense model architecture.
At the moment (16/08/2024), Recurrent Gemma have 2 major version.
The major version are Gemma1 and Gemma2.
Gemma models support quantization.
Quantization Technique | KV cache | Support | Updated |
---|---|---|---|
FP16 | FP16 | Yes | Yes |
FP8 | FP16 | Yes | No |
FP8 | FP8 | Yes | No |
int8_weight_only | FP16 | Yes | Yes |
int4_weight_only | FP16 | Yes | Yes |
w4a16_awq | FP16 | No | No |
w4a8_awq | FP16 | No | No |
Model version | Model | Context size |
---|---|---|
Gemma | gemma-2b | 8,192 |
Gemma | gemma-7b | 8,192 |
Gemma2 | gemma2-2b | 8,192 |
Gemma2 | gemma2-9b | 8,192 |
Gemma2 | gemma2-27b | 8,192 |