Google has launched Gemma 4 12B, a 12-billion-parameter open AI model designed to run locally on laptops without fully relying on cloud infrastructure. The model is small enough to run on systems with 16GB of VRAM or unified memory.
Gemma 4 12B is part of Google’s Gemma family of open models, which are built from the same research and technology used in Gemini models. The model is released under the Apache 2.0 license, allowing developers and organizations to use, modify, and deploy it with fewer restrictions.
The new model is designed to bring advanced AI capabilities directly to personal computers. It aims to give developers, researchers, and businesses access to local AI workflows without relying only on remote data centers.
The model delivers performance close to larger AI systems while using less memory. Its smaller footprint makes it more practical for laptops and workstations used by developers and researchers.
Gemma 4 12B can process text, images, and audio. This allows it to support tasks beyond standard text-based prompts, including visual understanding, audio input processing, and advanced reasoning.
These capabilities make the model suitable for software development, content creation, research, automation, and agentic workflows.
One of the main technical changes in Gemma 4 12B is its unified architecture. Instead of using separate multimodal encoders for text, image, and audio inputs, the model sends these inputs directly into the LLM backbone.
This design is meant to improve efficiency while reducing memory requirements and computational overhead. It also helps the model keep multimodal capabilities while remaining small enough for local use on modern hardware.
Google is also offering developer support through the Google AI Edge stack, allowing users to build and test local agentic workflows on consumer hardware.
The developer guide notes that Gemma 4 12B supports local inference on dedicated GPU laptops with 16GB of VRAM or unified memory. Google is also releasing a dedicated multi-token prediction model to improve local inference speeds.
📢 For the latest Tech & Telecom news, videos and analysis join ProPakistani's WhatsApp Group now!
Follow ProPakistani on Google News & scroll through your favourite content faster!
Shares