Google has launched Gemma 4 12B, a 12-billion-parameter open AI model designed to run locally on laptops without fully relying on cloud infrastructure. The model is small enough to run on systems with 16GB of VRAM or unified memory. Gemma 4 12B is part of Google’s Gemma family of open models, which are built from the same research and technology used in Gemini models. The model is released under the Apache 2.0 license, allowing developers and organizations to use, modify, and deploy it with fewer restrictions.

Gemma 4 12B is part of Google’s Gemma family of open models, which are built from the same research and technology used in Gemini models. The model is released under the Apache 2.0 license, allowing developers and organizations to use, modify, and deploy it with fewer restrictions.

The new model is designed to bring advanced AI capabilities directly to personal computers. It aims to give developers, researchers, and businesses access to local AI workflows without relying only on remote data centers.

The model delivers performance close to larger AI systems while using less memory. Its smaller footprint makes it more practical for laptops and workstations used by developers and researchers.

Gemma 4 12B can process text, images, and audio. This allows it to support tasks beyond standard text-based prompts, including visual understanding, audio input processing, and advanced reasoning.

These capabilities make the model suitable for software development, content creation, research, automation, and agentic workflows.

One of the main technical changes in Gemma 4 12B is its unified architecture. Instead of using separate multimodal encoders for text, image, and audio inputs, the model sends these inputs directly into the LLM backbone.

This design is meant to improve efficiency while reducing memory requirements and computational overhead. It also helps the model keep multimodal capabilities while remaining small enough for local use on modern hardware.

Google is also offering developer support through the Google AI Edge stack, allowing users to build and test local agentic workflows on consumer hardware.

The developer guide notes that Gemma 4 12B supports local inference on dedicated GPU laptops with 16GB of VRAM or unified memory. Google is also releasing a dedicated multi-token prediction model to improve local inference speeds.

📢 For the latest Tech & Telecom news, videos and analysis join ProPakistani's WhatsApp Group now!

Follow ProPakistani on Google News & scroll through your favourite content faster!

Worried About AI Corpos Stealing Your Data? Google’s New Gemma 4 12B Can Run Locally On Your Laptop

Related stories

Google Releases Lightning-Fast Open Source AI Model With 4x Faster Text Generation — Runs on Consumer GPUs

Thinking Machines amps up its bet against one-size-fits-all AI with its first open model, Inkling

Hot French startup ZML releases free product to speed inference across lots of AI chips

New Open Model Beats GPT 5.5 Pro at 1/6th of Cost

Why Google’s AI can’t spell Google (or anything else)

Osaurus brings both local and cloud AI models to your Mac