In a nutshell: Google has released the Gemma 4 open-weight AI model, designed to run locally on smartphones and other consumer devices. Built on Gemini 3, Gemma 4 comes in four versions optimized for different use cases, giving users and developers the flexibility to choose the model that best fits their needs.

The two largest Gemma 4 models – 26B Mixture of Experts and 31B Dense – require an 80GB Nvidia H100 GPU to run unquantized in bfloat16 format. Google claims these models deliver "frontier intelligence on personal computers" for students, researchers, and developers, providing advanced reasoning capabilities for IDEs, coding assistants, and agentic workflows.

The 26B model activates only 3.8 billion of its 26 billion parameters in inference mode, resulting in higher tokens-per-second performance compared with similar models, while significantly reducing latency. In contrast, the 31B model focuses on "maximizing raw quality" and allows developers to fine-tune it for specific use cases.

The variants most relevant to end-users are Effective 2B and Effective 4B. These models can run entirely offline and use minimal memory during inference, with only 2 billion and 4 billion parameters, respectively. Google says that reducing the number of active parameters enables these models to run on mobile and IoT devices, including smartphones, Raspberry Pi, and Jetson Nano.

Google claims that its Gemma 4 models are not only significantly faster than Gemma 3 but also the most capable AI models ever designed to run on local hardware. Independent testing appears to support this claim: the 31B model currently ranks #3 on the Arena AI leaderboard for open models, behind GLM-5 and Kimi 2.5, while the 26B sits at #6.

Gemma 4 has been released under an Apache 2.0 license, allowing developers to integrate it into their apps and services without usage restrictions. By comparison, Gemma 3 is governed by a custom Google license with strict usage policies and numerous limitations, making it less attractive for developers.

It is worth noting that, despite the Apache 2.0 license, Gemma 4 is "open-weight" rather than fully open-source. According to the Open Source Initiative, an AI model can only be considered open-source if the complete dataset used for training, along with scripts, infrastructure code, and detailed methodologies, is released.

Google, however, is only releasing the model parameters, not the full, reproducible training pipeline, which prevents others from recreating the model from scratch. For most developers, this limitation is unlikely to matter, as the Apache 2.0 license still permits all forms of commercial use, modification, redistribution, and deployment, with only attribution required.