Chinese delivery app company Meituan officially open-sourced LongCat-2.0 a few hours ago on GitHub, Hugging Face, and its own platform.
The release confirmed that LongCat-2.0 powered Owl Alpha, an anonymous model that spent the previous two months ranking among the most widely used developer models on OpenRouter.
Meituan designed the model to challenge the dominance of closed-source systems in autonomous software engineering. LongCat-2.0 uses a 1.6-trillion-parameter Mixture-of-Experts architecture and supports a native context window of 1 million tokens.
The company released the model under the MIT License, allowing commercial use, modification, and integration into proprietary products.
Meituan is offering LongCat-2.0 through conventional pay-as-you-go API billing and limited Token Packs.
Under standard pricing, uncached input costs $0.75 per million tokens, while output costs $2.95 per million tokens. Context-cache hits are processed free of charge.
A limited-time promotion reduces the price to $0.30 per million uncached input tokens and $1.20 per million output tokens.
The following table compares LongCat-2.0 with other models:
Meituan trained LongCat-2.0 entirely on a cluster containing more than 50,000 domestically produced Chinese Application-Specific Integrated Circuits.
The deployment shows that near-frontier AI models can be trained at scale without relying on the Nvidia GPUs that have powered much of the global generative AI industry.
The use of alternative chips could create pressure for Nvidia if Chinese companies continue developing trillion-parameter models using domestic ASICs instead of general-purpose GPUs.
LongCat-2.0 uses a Mixture-of-Experts design with 1.6 trillion total parameters. However, it activates an average of 48 billion parameters for each token.
The number of active parameters changes according to the complexity of the request, ranging from 33 billion to 56 billion.
The model uses a Zero-Compute Experts framework that directs routine tasks through lighter subnetworks. This reduces the unused computational overhead associated with dense models.
LongCat-2.0 focuses on multi-step engineering work, tool use and automated repository changes rather than mainly serving as a conversational model.
These tasks are commonly described as agentic work because the model must plan, use tools and complete multiple connected actions.
LongCat-2.0 scored 59.5 on SWE-bench Pro, compared with GPT-5.5’s score of 58.6.
It also scored 70.8 on Terminal-Bench 2.1, 77.3 on SWE-bench Multilingual and 73.2 on the FORTE corporate workflow simulator.
Meituan presented these results in benchmark comparison bar charts credited to Meituan LongCat.
Although LongCat-2.0 generally trails premium models such as Claude Opus 4.8 on broad agent benchmarks, including FORTE and BrowseComp, it performs strongly in software engineering.
Its score of 59.5 on SWE-bench Pro narrowly exceeds GPT-5.5’s score of 58.6, despite using a smaller number of active parameters during computation.
LongCat-2.0 can support software engineering, system operations and the analysis of large documents and datasets.
Its open-weight design, MIT License, and 1-million-token context window allow organizations to host it themselves rather than depend entirely on proprietary external APIs.
This can help companies reduce recurring API costs and keep private data within their own systems.
Development teams could use the model’s Agent Experts to manage automated codebase migrations.
Engineers could place a complete enterprise software repository and updated software development kit documentation inside the model’s context window.
LongCat-2.0 could then identify dependencies, make repository-wide changes, compile the updated code, detect compilation and execution problems in local sandbox environments, and create a final pull request.
The MOPD gate-routing system could also support companies operating under strict compliance requirements.
Financial institutions and healthcare companies could route operational requests through separate expert groups, allowing complex logical and mathematical work while limiting hallucinations and maintaining safety controls.
The Interaction Experts act as a guardrail layer by reducing errors and enforcing instructions without reducing the processing ability of the Reasoning Experts.
Combined with free context caching, this could allow companies to operate automated software systems that repeatedly inspect corporate data, maintain internal systems, and optimize infrastructure at lower operating costs.
During its anonymous deployment on OpenRouter under the Owl Alpha name, the model processed approximately 10.1 trillion tokens per month.
Daily usage averaged 559 billion tokens, representing a 242% month-over-month increase that moved the model into OpenRouter’s global top three.
Before Meituan disclosed its identity, the model had already ranked first in the Hermes Agent workspace, second in Claude Code deployments, and third across international OpenClaw environments.
Meituan developed LongCat Sparse Attention to support the model’s 1-million-token context window without creating major hardware bottlenecks.
The system builds on DeepSeek Sparse Attention and addresses quadratic scoring costs and memory fragmentation through three methods.
Streaming-aware Indexing combines hardware-aligned, continuous data reads with dynamic random selection.
The system converts fragmented memory access into predictable sequential blocks. This improves High Bandwidth Memory use and increases effective bandwidth.
Cross-Layer Indexing uses the tendency of attention patterns to remain stable across nearby hidden layers.
A single indexing calculation can guide several consecutive layers during inference. Meituan reinforced this capability through cross-layer distillation during training.
Hierarchical Indexing uses a two-stage, coarse-to-fine scoring process.
The indexer first performs a fast block-level search to identify possible candidates. It then conducts detailed token selection only on the remaining candidates.
Meituan also added an N-gram Embedding module taken from its smaller model lines.
The system adds 135 billion parameters to a framework based on five-token combinations. These parameters sit in sparse dimensions separate from the Mixture-of-Experts structure.
This approach expands the central embedding space by approximately 100 times. It allows the model to capture local token relationships while improving large-batch inference by reducing memory input and output bottlenecks.
The launch comes as Washington pressures leading American AI companies to restrict access to their newest models.
Following a request from the US government, OpenAI limited access to its GPT-5.6 models. US authorities had also directed Anthropic to restrict access to Claude Fable 5 and Claude Mythos 5, prompting the company to take the models offline entirely.
Technologists, activists, and industry experts have warned that these restrictions may have produced the opposite result.
By limiting access to Western closed-source systems and increasing API costs, the restrictions have created an opening for developers seeking affordable, high-performance alternatives, including Chinese open-source models such as LongCat-2.0.
Meituan developed a post-training system called Multi-Teacher Optimization via Mixture of Specialized Experts, or MOPD.
Instead of combining human feedback into one reward function, MOPD separates post-training into three specialized groups: Agent Experts, Reasoning Experts and Interaction Experts.
Agent Experts focus on structured execution, accurate tool calls, multi-turn API parameter handling and self-correcting loops designed to prevent stalled tasks.
Reasoning Experts focus on multi-step logic, engineering problems, mathematics, and advanced science, technology, engineering, and mathematics tasks.
Interaction Experts focus on instruction following, factual accuracy, reducing hallucinations, and maintaining safety controls without reducing the model’s usefulness.
Separating these functions during post-training helps prevent performance losses in one area from affecting another.
At runtime, a dynamic gate-routing system combines the three expert groups. This allows the model to use reasoning, tool execution, and controlled user interaction at the same time.
Meituan divides commercial access between standard real-time API billing and fixed Token Packs.
Standard top-up accounts deduct funds according to the number of input and output tokens processed.
Token Packs provide fixed, one-time token allocations that remain valid for 30 days. The packages operate alongside an organization’s existing API account.
Meituan releases the high-volume packages through limited flash sales four times per day. The sales begin at 10:00, 16:00, 21:00, and 23:00 Beijing Time and operate on a first-come, first-served basis.
The company uses this system to manage demand across its ASIC computing clusters.
The main pricing difference is that Meituan does not charge for context-cache hits.
Autonomous coding agents often need to repeatedly read and modify the same large software repository during an extended session. Standard API pricing can charge developers each time the model processes the same input context.
Under Meituan’s system, only uncached inputs and generated output tokens reduce the Token Pack balance.
This allows coding agents to repeatedly inspect the same large codebase without creating additional charges for cached content.
Meituan released LongCat-2.0 under the open-source MIT License, giving companies wide flexibility to use and modify the model.
Unlike copyleft licences such as the GNU General Public License, the MIT License does not require developers to release the source code of derivative products or internal systems that use the software.
Companies can modify and integrate LongCat-2.0 into closed-source applications, proprietary development tools and internal automation systems.
They can also create their own versions of the repository, adjust LongCat Sparse Attention for private databases and sell products built on the model without disclosing their private code or modifications.
Get the latest tech news, telecom insights, and product launches wherever you prefer.
Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.
Shares