MiniMax M3 scores 59% on SWE-Bench Pro, surpassing GPT-5.5, as the first open-weight model with 1M-token context

MiniMax released M3, an open-weight model scoring 59.0% on SWE-Bench Pro — above GPT-5.5 — with a 1-million-token context window and native multimodality in a single sparse-attention architecture. Open weights land on Hugging Face within ten days, making a self-hostable model competitive with today's closed frontier available to teams that cannot route proprietary code through external APIs.

Source: marktechpost.com ↗

MSA uses a "KV outer gather Q" approach. KV blocks serve as the outer loop to aggregate the queries that hit them. Each block is read only once and memory access is contiguous.

MiniMax team

Why this matters

→ Open-weight model matches closed frontier on coding benchmarks without proprietary API dependency
→ 1M-token context with 15× speedup enables long-context autonomous tasks
→ Self-hostable alternative for teams protecting proprietary code

Open weights catch frontier

Also in this edition