Visit our Hugging Face organization (click links above), search for models and datasets starting with LIMI
, and you will find all you need! Enjoy!
To learn more about LIMI, feel free to explore our documentation and resources. Our release consists of the following sections:
- Model Zoo && Quick Start: Basic usage and demonstrations with Transformers, vLLM, and SGLang for LIMI and LIMI-Air;
- Training: Instructions for fine-tuning and post-training with slime framework and distributed training scripts;
- Evaluation: Comprehensive evaluation suite with metrics for agentic capabilities assessment;
- Framework Integration: Usage of LIMI with frameworks for agentic applications, tool use, and reasoning tasks.
- 2025.09.23: 🚀 LIMI paper is now available on arXiv! Check out our paper for detailed methodology and experimental results.
- 2025.09.23: 🤗 Released LIMI models on Hugging Face! Both LIMI (355B) and LIMI-Air (106B) are now available.
- 2025.09.23: 📊 Released the LIMI training dataset with 78 carefully curated samples on Hugging Face.
LIMI establish the Agency Efficiency Principle: machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations. This discovery fundamentally reshapes how we develop autonomous AI systems, suggesting that mastering agency requires understanding its essence, not scaling training data. As industries transition from thinking AI to working AI, LIMI provides a paradigm for sustainable cultivation of truly agentic intelligence.
-
A New Data Paradigm: We challenge the traditional "more is better" data philosophy by achieving superior AI agency with only 78 high-quality samples, proving that data quality far outweighs quantity.
-
Resource Efficiency: By focusing on core capabilities instead of massive datasets, we significantly reduce the computational resources required for training while effectively boosting the model's performance on complex tasks.
-
Focus on Productive Workers: Our approach is dedicated to cultivating AI's essential ability to act as a "worker"—to autonomously identify problems, plan, and execute tasks—rather than just "thinking" and "generating."
-
Outperforming Leading Models: LIMI significantly surpasses multiple large-scale models in AgencyBench, achieving a performance boost of up to 53.7% with only 1/128th of the sample size.
Our models achieve state-of-the-art performance across multiple agentic evaluation tasks:
Model | FTFC (↑) | RC@3 (↑) | SR@3 (↑) | Avg. |
---|---|---|---|---|
GLM-4.5-Air | 15.0 | 16.1 | 20.0 | 17.0 |
GLM-4.5 | 37.8 | 50.0 | 47.4 | 45.1 |
GLM-4.5-CodeAgent | 48.0 | 48.0 | 47.5 | 47.8 |
LIMI-Air | 35.4 | 34.3 | 33.1 | 34.3 |
LIMI | 71.7 | 74.2 | 74.6 | 73.5 |
For detailed benchmark results, experimental setup, and comprehensive comparisons, please refer to our paper.
Our LIMI models are available on Hugging Face 🤗:
Model | Backbone | Size | Link |
---|---|---|---|
LIMI | GLM-4.5 | 355B | 🤗 |
LIMI-Air | GLM-4.5-Air | 106B | 🤗 |
We release our datasets through Hugging Face 🤗:
Dataset | Description | Link |
---|---|---|
LIMI | Updated training set for the paper (78 samples) | 🤗 |
Our models are fine-tuned on GLM-4.5 and are compatible with most mainstream frameworks like HF Transformers, SGLang, Megatron, slime and etc.
Start with HF Transformers
# Install required packages
pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Initialize model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"GAIR/LIMI",
torch_dtype="auto",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMI", trust_remote_code=True)
# Prepare input messages (We use the following template and system prompt during training and inference)
messages = [
{"role": "system", "content": "You are a helpful assistant tasked with discovering mathematical function structures for scientific systems."},
{"role": "user", "content": "Modify the \texttt{equation.py} function, considering the physical meaning and relationships of the inputs."}
]
# Format input using chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Tokenize input
inputs = tokenizer(text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=128000,
temperature=0.6,
top_p=0.95,
do_sample=True
)
# Decode and print response
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
Start with VLLM
# Install required packages
pip install vllm
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
# Initialize the model
llm = LLM(
model="GAIR/LIMI",
tensor_parallel_size=4, # adjust based on available GPUs
trust_remote_code=True,
swap_space=60,
gpu_memory_utilization=0.96,
)
# Prepare input messages (We use the following template and system prompt during training and inference)
messages = [
{"role": "system", "content": "You are a helpful assistant tasked with discovering mathematical function structures for scientific systems."},
{"role": "user", "content": "Modify the \texttt{equation.py} function, considering the physical meaning and relationships of the inputs."}
]
# Setup tokenizer
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMI", trust_remote_code=True)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Configure generation parameters
sampling_params = SamplingParams(
temperature=0.6,
max_tokens=128000,
top_p=0.95,
)
# Generate response
output = llm.generate(text, sampling_params)
print(output[0].outputs[0].text)
We utilize slime framework for training, which provides a convenient and efficient training pipeline.
-
Environment Setup
- Set up slime following their official documentation.
- Ensure all dependencies are properly installed and configured.
-
Data Preparation
- Obtain the LIMI dataset from 🤗 Hugging Face.
-
Configuration
- Use our provided training script.
- The script file contains all necessary hyperparameters and training settings.
To support the rigorous assessment of agentic capabilities outlined in this work, we release a comprehensive evaluation suite. This framework is designed to benchmark agency for Large Language Models (LLMs) on the held-out evaluation subset
The evaluation module implements the three key metrics: First-Turn Functional Completeness (FTFC), Success Rate (SR@R) and Remaining Chances (RC@R), with a computational budget of R = 3 rounds. For detailed benchmark tasks, please refer to AgencyBench.
This project is licensed under the MIT License - see the LICENSE file for details.
@misc{xiao2025limiagency,
title={LIMI: Less is More for Agency},
author={Yang Xiao and Mohan Jiang and Jie Sun and Keyu Li and Jifan Lin and Yumin Zhuang and Ji Zeng and Shijie Xia and Qishuo Hua and Xuefeng Li and Xiaojie Cai and Tongyu Wang and Yue Zhang and Liming Liu and Xia Wu and Jinlong Hou and Yuan Cheng and Wenjie Li and Xiang Wang and Dequan Wang and Pengfei Liu},
year={2025},
eprint={2509.17567},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.17567},
}