415.tech
AI & tech, from the frontlines of Silicon Valley
Google's Gemma 4 12B drops separate encoders and hits near-26B performance at 16GB

Google's Gemma 4 12B drops separate encoders and hits near-26B performance at 16GB

Google shipped Gemma 4 12B, a 12B open-weight model that routes vision and audio directly into the LLM backbone instead of through separate encoders, matching near-26B MoE benchmarks while fitting in 16GB of RAM. The practical consequence is that a multimodal model capable of reasoning over text, images, and audio runs locally on a consumer laptop today, under Apache 2.0 with no commercial restrictions.

Source: blog.google

Post on XEmail

Small enough to run locally on consumer laptops with 16GB of RAM, it unlocks powerful multimodal and agentic experiences right on your machine.

Google

Why this matters

  • → Multimodal reasoning now runs on 16GB consumer laptops without separate encoders.
  • → Encoder-free architecture reduces latency and memory while matching 26B performance.
  • → Apache 2.0 license removes commercial restrictions on local AI deployment.
Multimodal on your laptop
Also in this edition