
Google's Gemma 4 12B drops separate encoders and hits near-26B performance at 16GB
Google shipped Gemma 4 12B, a 12B open-weight model that routes vision and audio directly into the LLM backbone instead of through separate encoders, matching near-26B MoE benchmarks while fitting in 16GB of RAM. The practical consequence is that a multimodal model capable of reasoning over text, images, and audio runs locally on a consumer laptop today, under Apache 2.0 with no commercial restrictions.
Source: blog.google ↗
Small enough to run locally on consumer laptops with 16GB of RAM, it unlocks powerful multimodal and agentic experiences right on your machine.
Google
Why this matters
- → Multimodal reasoning now runs on 16GB consumer laptops without separate encoders.
- → Encoder-free architecture reduces latency and memory while matching 26B performance.
- → Apache 2.0 license removes commercial restrictions on local AI deployment.
Multimodal on your laptop