Merge pull request #167 from rllm-org/dev-sijun

jeffreysijuntan · web-flow · commit 5758a0a52cc3 · 2025-07-30T14:43:55.000-07:00
update docs
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 [![Discord](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/BDH46HT9en)
 [![Website](https://img.shields.io/badge/Site-%23000000.svg?style=for-the-badge&logo=semanticweb&logoColor=white)](https://www.agentica-project.com) 
 [![Twitter/X](https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white)](https://x.com/Agentica_)
-[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/agentica-project/rllm)
+[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/rllm-org/rllm)
 [![Hugging Face Collection](https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/agentica-org)
 
 </div>
@@ -32,15 +32,15 @@ rLLM is an open-source framework for post-training language agents via reinforce
 - 🍽️ An In-Depth Blog Post on our [SWE Agents and RL Training Recipes](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73)
 - 🤗 HF Model [`DeepSWE-Preview`](https://huggingface.co/agentica-org/DeepSWE-Preview)
 - 🤗 HF Dataset [`R2E-Gym-Subset`](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)
-- 📄 [Training Scripts](https://github.com/agentica-project/rllm/tree/main/examples/swe)
+- 📄 [Training Scripts](https://github.com/rllm-org/rllm/tree/main/examples/swe)
 - 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepswe)—All training runs and ablations.
 - 🔎 [Evaluation Logs](https://drive.google.com/file/d/10LIwpJeaFuiX6Y-qEG2a4a335PEuQJeS/view?usp=sharing)—16 passes over SWE-Bench-Verified.
 
 <strong>[2025/04/08]</strong> We release [`DeepCoder-14B-Preview`](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51), a 14B coding model that achieves an impressive **60.6%** Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of `o3-mini-2025-01-031 (Low)` and `o1-2024-12-17`. 
 - ⬆️ An In-Depth Blog Post on our [Training Recipe and Insights](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51)
 - 🤗 HF Model [`DeepCoder-14B-Preview`](https://huggingface.co/agentica-org/DeepCoder-14B-Preview), [`DeepCoder-1.5B-Preview`](https://huggingface.co/agentica-org/DeepCoder-1.5B-Preview)
 - 🤗 HF Dataset [`DeepCoder-Preview-Dataset`](https://huggingface.co/datasets/agentica-org/DeepCoder-Preview-Dataset)
-- 📄 [Training Scripts](https://github.com/agentica-project/rllm/tree/main/scripts/deepcoder/train)—Exact hyperparameters we used to achieve `o3-mini` performance.
+- 📄 [Training Scripts](https://github.com/rllm-org/rllm/tree/main/scripts/deepcoder/train)—Exact hyperparameters we used to achieve `o3-mini` performance.
 - 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepcoder)—All training runs and ablations.
 - 🔎 [Evaluation Logs](https://drive.google.com/file/d/1tr_xXvCJnjU0tLO7DNtFL85GIr3aGYln/view?usp=sharing)—LiveCodeBench and Codeforces logs for DeepCoder.
 
@@ -59,7 +59,7 @@ rLLM is an open-source framework for post-training language agents via reinforce
 
 ```bash
 # Clone the repository
-git clone --recurse-submodules https://github.com/agentica-project/rllm.git
+git clone --recurse-submodules https://github.com/rllm-org/rllm.git
 cd rllm
 
 # create a conda environment
diff --git a/docs/examples/deepcoder.md b/docs/examples/deepcoder.md
@@ -81,4 +81,4 @@ DeepCoder training configuration:
 --8<-- "examples/deepcoder/train_deepcoder.py"
 ```
 
-For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/deepcoder/README.md) in the deepcoder example directory.
+For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/deepcoder/README.md) in the deepcoder example directory.
diff --git a/docs/examples/deepscaler.md b/docs/examples/deepscaler.md
@@ -83,4 +83,4 @@ DeepScaler training configuration:
 --8<-- "examples/deepscaler/train_deepscaler.py"
 ```
 
-For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/deepscaler/README.md) in the deepscaler example directory.
+For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/deepscaler/README.md) in the deepscaler example directory.
diff --git a/docs/examples/frozenlake.md b/docs/examples/frozenlake.md
@@ -48,4 +48,4 @@ Agent training implementation:
 --8<-- "examples/frozenlake/train_frozenlake_agent.py"
 ```
 
-For more details, see the [FrozenLake README](https://github.com/agentica-project/rllm/blob/main/examples/frozenlake/README.md). 
+For more details, see the [FrozenLake README](https://github.com/rllm-org/rllm/blob/main/examples/frozenlake/README.md). 
diff --git a/docs/examples/search.md b/docs/examples/search.md
@@ -53,4 +53,4 @@ Search agent training configuration:
 --8<-- "examples/search/train_search_agent.py"
 ```
 
-For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/search/README.md) in the search example directory. 
+For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/search/README.md) in the search example directory. 
diff --git a/docs/examples/sft.md b/docs/examples/sft.md
@@ -127,4 +127,4 @@ Script for evaluating SFT model performance:
 --8<-- "examples/sft/run_sft_model.py"
 ```
 
-For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm-internal/blob/v0.1/examples/sft/README.md) in the sft example directory.
+For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/sft/README.md) in the sft example directory.
diff --git a/docs/examples/swe.md b/docs/examples/swe.md
@@ -75,4 +75,4 @@ DeepSWE training configuration:
 --8<-- "examples/swe/train_deepswe_agent.py"
 ```
 
-For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/swe/README.md) in the deepswe example directory.
+For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/swe/README.md) in the deepswe example directory.
diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md
@@ -24,7 +24,7 @@ rLLM uses [verl](https://github.com/volcengine/verl) as its training backend. Fo
 
 ```bash
 # Clone the repository
-git clone --recurse-submodules https://github.com/agentica-project/rllm.git
+git clone --recurse-submodules https://github.com/rllm-org/rllm.git
 cd rllm
 
 # create a conda environment
@@ -38,4 +38,4 @@ pip install -e .
 
 This will install rLLM and all its dependencies in development mode.
 
-For more help, refer to the [GitHub issues page](https://github.com/agentica-project/rllm/issues). 
+For more help, refer to the [GitHub issues page](https://github.com/rllm-org/rllm/issues). 
diff --git a/docs/index.md b/docs/index.md
@@ -26,7 +26,7 @@ rLLM currently supports a variety of built-in agents:
 - **Frozenlake Agent**: Train agents to navigate text-based grid world. (useful for testing/debugging RL algorithms)
 
 ## 🛠️ Train Your Own Agents & Environments
-rLLM is designed to be extensible. You can easily build and train your own custom agents and environments using our modular API and training engine. Walk through our [core concepts](./core-concepts/overview.md) and [examples](https://github.com/agentica-project/rllm/tree/main/examples) to understand the fundamentals of rLLM and build your own custom agents and environments tailored to your specific use cases.
+rLLM is designed to be extensible. You can easily build and train your own custom agents and environments using our modular API and training engine. Walk through our [core concepts](./core-concepts/overview.md) and [examples](https://github.com/rllm-org/rllm/tree/main/examples) to understand the fundamentals of rLLM and build your own custom agents and environments tailored to your specific use cases.
 
 ## 🚀Future Roadmap
 
diff --git a/examples/agents/swe/README.md b/examples/agents/swe/README.md
@@ -13,12 +13,12 @@
 •
 <a href="https://agentica-project.com/" > 🌐 Project Page</a>
 •
-<a href="https://github.com/agentica-project/rllm" > 🧑‍💻 Code</a>
+<a href="https://github.com/rllm-org/rllm" > 🧑‍💻 Code</a>
 </p>
 
 <div align="center">
 
-[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/agentica-project/rllm)
+[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/rllm-org/rllm)
 [![Website](https://img.shields.io/badge/Site-%23000000.svg?style=for-the-badge&logo=semanticweb&logoColor=white)](https://www.agentica-project.com) 
 [![Twitter](https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white)](https://x.com/Agentica_)
 [![Hugging Face Collection](https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/agentica-org)
diff --git a/examples/swe/README.md b/examples/swe/README.md
@@ -15,12 +15,12 @@
 •
 <a href="https://agentica-project.com/" > 🌐 Project Page</a>
 •
-<a href="https://github.com/agentica-project/rllm" > 🧑‍💻 Code</a>
+<a href="https://github.com/rllm-org/rllm" > 🧑‍💻 Code</a>
 </p>
 
 <div align="center">
 
-[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/agentica-project/rllm)
+[![Github](https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/rllm-org/rllm)
 [![Website](https://img.shields.io/badge/Site-%23000000.svg?style=for-the-badge&logo=semanticweb&logoColor=white)](https://www.agentica-project.com) 
 [![Twitter](https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white)](https://x.com/Agentica_)
 [![Hugging Face Collection](https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/agentica-org)
@@ -29,7 +29,7 @@
 
 We introduce DeepSWE-Preview, a reasoning-enabled coding agent trained from scratch from Qwen3-32B with only reinforcement learning (RL). It achieves 59.2% on SWE-Bench-Verified with test-time scaling, reaching SOTA for open-weight coding agents (42.2% Pass@1, 71.0% Pass@16).
 
-DeepSWE is trained using [**rLLM**](https://github.com/agentica-project/rllm), our framework for post-training language agents using high-quality SWE environments from [**R2E-Gym**](https://github.com/R2E-Gym/R2E-Gym). We’ve open-sourced everything—our dataset, code, training, and evaluation logs, for everyone to progress on scaling and improving agents with RL.
+DeepSWE is trained using [**rLLM**](https://github.com/rllm-org/rllm), our framework for post-training language agents using high-quality SWE environments from [**R2E-Gym**](https://github.com/R2E-Gym/R2E-Gym). We’ve open-sourced everything—our dataset, code, training, and evaluation logs, for everyone to progress on scaling and improving agents with RL.
 
 ## Quick Start 🎯