Chinese AI company Z.ai has released GLM-5.2, a 753-billion-parameter open-weight model designed for autonomous coding and long-running engineering tasks.

The model supports a stable one-million-token context window and is available through Hugging Face, Z.ai’s API and more than 20 third-party coding tools.

Z.ai has released the model weights under the MIT licence, allowing companies to download, modify, fine-tune and deploy the model on their own infrastructure.

GLM-5.2 introduces an architecture called IndexShare, which reuses one indexer across every four sparse-attention layers.

Z.ai says this reduces per-token computing requirements by 2.9 times when the model operates at its full one-million-token context length.

The company has also upgraded the Multi-Token Prediction layer used for speculative decoding, increasing the accepted token length by up to 20%.

Users can select between Max and High thinking modes. Max prioritises performance and uses more output tokens, while High reduces token usage and latency with a small decline in benchmark performance.

GLM-5.2 scored 62.1 on SWE-bench Pro, ahead of GPT-5.5 at 58.6 and GLM-5.1 at 58.4.

On FrontierSWE, it reached 74.4%, compared with 72.6% for GPT-5.5 and 75.1% for Claude Opus 4.8.

The model scored 76.8 on the public MCP-Atlas tool-use benchmark. GPT-5.5 scored 75.3, while Claude Opus 4.8 reached 77.8.

With access to external tools, GLM-5.2 scored 54.7 on Humanity’s Last Exam. GPT-5.5 scored 52.2, while Claude Opus 4.8 reached 57.9.

GLM-5.2 scored 34.3% on PostTrainBench, compared with 28.4% for GPT-5.5.

It also achieved 13% on SWE-Marathon, slightly ahead of GPT-5.5 at 12%, although Claude Opus 4.8 remained higher at 26%.

On Terminal-Bench 2.1, GLM-5.2 scored 81.0. Claude Opus 4.8 and GPT-5.5 scored 85 and 84, respectively, while Gemini 3.1 Pro scored 74.

Z.ai also said the model took first place on the crowdsourced Design Arena benchmark with an Elo score of 1,360.

The MIT licence allows businesses to use, modify and commercialise GLM-5.2 without royalties or regional restrictions.

Companies can deploy the model through supported frameworks including vLLM, SGLang, Transformers, KTransformers and Unsloth.

This gives businesses the option to run the model on private infrastructure instead of relying entirely on an external API.

However, running a model with 753 billion parameters locally would still require substantial computing hardware.

Z.ai has also introduced GLM Coding Plan subscriptions for developers using agentic coding tools.

The model works with tools including Claude Code, OpenClaw, Cline, Kilo Code, Crush, and Factory.

The Lite plan costs the equivalent of $12.60 per month when billed annually, or $151.20 per year from the second year.

The Pro plan costs $50.40 per month and offers five times the Lite usage allowance.

The Max plan costs $112 per month and provides 20 times the Lite allowance, along with dedicated resources during peak hours.

GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens through Z.ai’s API.

Cached input costs $0.26 per million tokens, while cached-input storage is temporarily free.

The pricing remains unchanged from GLM-5.1.

Coding tools have started adding support for GLM-5.2 following its release.

Kilo Code confirmed support for the model’s one-million-token context window and Max thinking mode.

Cline also added GLM-5.2, while other coding environments are testing it on longer agentic workflows.

The model is available now through Hugging Face and Z.ai’s developer platform.

Get the latest tech news, telecom insights, and product launches wherever you prefer.

Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.

Shares