Skip to content

Commit e8e0c76

Browse files
authored
Add activation sparsity reference in gemma3n doc (#39160)
Add activation sparsity reference in the description of gemma3n
1 parent 8e87adc commit e8e0c76

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

docs/source/en/model_doc/gemma3n.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ rendered properly in your Markdown viewer.
2929
Gemma3n is a multimodal model with pretrained and instruction-tuned variants, available in E4B and E2B sizes. While
3030
large portions of the language model architecture are shared with prior Gemma releases, there are many new additions in
3131
this model, including [Alternating Updates][altup] (AltUp), [Learned Augmented Residual Layer][laurel] (LAuReL),
32-
[MatFormer][matformer], Per-Layer Embeddings (PLE), activation sparsity, and KV cache sharing. The language model uses
32+
[MatFormer][matformer], Per-Layer Embeddings (PLE), [Activation Sparsity with Statistical Top-k][spark-transformer], and KV cache sharing. The language model uses
3333
a similar attention pattern to [Gemma 3](./gemma3.md) with alternating 4 local sliding window self-attention layers for
3434
every global self-attention layer with a maximum context length of 32k tokens. Gemma 3n introduces
3535
[MobileNet v5][mobilenetv5] as the vision encoder, using a default resolution of 768x768 pixels, and adds a newly
@@ -201,4 +201,5 @@ echo -e "Plants create energy through a process known as" | transformers run --t
201201
[gemma3n-collection]: https://huggingface.co/collections/google/gemma-3n
202202
[laurel]: https://arxiv.org/abs/2411.07501
203203
[matformer]: https://arxiv.org/abs/2310.07707
204+
[spark-transformer]: https://arxiv.org/abs/2506.06644
204205
[usm]: https://arxiv.org/abs/2303.01037

0 commit comments

Comments
 (0)