Google's Gemma 4 12B drops separate encoders and hits near-26B performance at 16GB

Google shipped Gemma 4 12B, a 12B open-weight model that routes vision and audio directly into the LLM backbone instead of through separate encoders, matching near-26B MoE benchmarks while fitting in 16GB of RAM. The practical consequence is that a multimodal model capable of reasoning over text, images, and audio runs locally on a consumer laptop today, under Apache 2.0 with no commercial restrictions.

Source: blog.google ↗

Small enough to run locally on consumer laptops with 16GB of RAM, it unlocks powerful multimodal and agentic experiences right on your machine.

Google

Why this matters

→ Multimodal reasoning now runs on 16GB consumer laptops without separate encoders.
→ Encoder-free architecture reduces latency and memory while matching 26B performance.
→ Apache 2.0 license removes commercial restrictions on local AI deployment.

Multimodal on your laptop

Also in this edition

SoftBank pledges €75B to build Europe's largest AI data center cluster in France
Japan commits $500M to the US Genesis Mission as its first international partner
MiniMax M3 scores 59% on SWE-Bench Pro, surpassing GPT-5.5, as the first open-weight model with 1M-token context