You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://huggingface.co/agentica-org)
20
20
21
21
</div>
@@ -32,15 +32,15 @@ rLLM is an open-source framework for post-training language agents via reinforce
32
32
- 🍽️ An In-Depth Blog Post on our [SWE Agents and RL Training Recipes](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73)
33
33
- 🤗 HF Model [`DeepSWE-Preview`](https://huggingface.co/agentica-org/DeepSWE-Preview)
- 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepswe)—All training runs and ablations.
37
37
- 🔎 [Evaluation Logs](https://drive.google.com/file/d/10LIwpJeaFuiX6Y-qEG2a4a335PEuQJeS/view?usp=sharing)—16 passes over SWE-Bench-Verified.
38
38
39
39
<strong>[2025/04/08]</strong> We release [`DeepCoder-14B-Preview`](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51), a 14B coding model that achieves an impressive **60.6%** Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of `o3-mini-2025-01-031 (Low)` and `o1-2024-12-17`.
40
40
- ⬆️ An In-Depth Blog Post on our [Training Recipe and Insights](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51)
41
41
- 🤗 HF Model [`DeepCoder-14B-Preview`](https://huggingface.co/agentica-org/DeepCoder-14B-Preview), [`DeepCoder-1.5B-Preview`](https://huggingface.co/agentica-org/DeepCoder-1.5B-Preview)
- 📄 [Training Scripts](https://github.com/agentica-project/rllm/tree/main/scripts/deepcoder/train)—Exact hyperparameters we used to achieve `o3-mini` performance.
43
+
- 📄 [Training Scripts](https://github.com/rllm-org/rllm/tree/main/scripts/deepcoder/train)—Exact hyperparameters we used to achieve `o3-mini` performance.
44
44
- 📈 [Wandb Training Logs](https://wandb.ai/mluo/deepcoder)—All training runs and ablations.
45
45
- 🔎 [Evaluation Logs](https://drive.google.com/file/d/1tr_xXvCJnjU0tLO7DNtFL85GIr3aGYln/view?usp=sharing)—LiveCodeBench and Codeforces logs for DeepCoder.
46
46
@@ -59,7 +59,7 @@ rLLM is an open-source framework for post-training language agents via reinforce
Copy file name to clipboardExpand all lines: docs/examples/deepcoder.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,4 +81,4 @@ DeepCoder training configuration:
81
81
--8<--"examples/deepcoder/train_deepcoder.py"
82
82
```
83
83
84
-
For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/deepcoder/README.md) in the deepcoder example directory.
84
+
For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/deepcoder/README.md) in the deepcoder example directory.
Copy file name to clipboardExpand all lines: docs/examples/deepscaler.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,4 +83,4 @@ DeepScaler training configuration:
83
83
--8<--"examples/deepscaler/train_deepscaler.py"
84
84
```
85
85
86
-
For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/deepscaler/README.md) in the deepscaler example directory.
86
+
For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/deepscaler/README.md) in the deepscaler example directory.
Copy file name to clipboardExpand all lines: docs/examples/search.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,4 +53,4 @@ Search agent training configuration:
53
53
--8<--"examples/search/train_search_agent.py"
54
54
```
55
55
56
-
For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/search/README.md) in the search example directory.
56
+
For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/search/README.md) in the search example directory.
Copy file name to clipboardExpand all lines: docs/examples/sft.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -127,4 +127,4 @@ Script for evaluating SFT model performance:
127
127
--8<--"examples/sft/run_sft_model.py"
128
128
```
129
129
130
-
For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm-internal/blob/v0.1/examples/sft/README.md) in the sft example directory.
130
+
For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/sft/README.md) in the sft example directory.
Copy file name to clipboardExpand all lines: docs/examples/swe.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,4 +75,4 @@ DeepSWE training configuration:
75
75
--8<--"examples/swe/train_deepswe_agent.py"
76
76
```
77
77
78
-
For detailed setup instructions, see the [README](https://github.com/agentica-project/rllm/blob/main/examples/swe/README.md) in the deepswe example directory.
78
+
For detailed setup instructions, see the [README](https://github.com/rllm-org/rllm/blob/main/examples/swe/README.md) in the deepswe example directory.
Copy file name to clipboardExpand all lines: docs/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ rLLM currently supports a variety of built-in agents:
26
26
-**Frozenlake Agent**: Train agents to navigate text-based grid world. (useful for testing/debugging RL algorithms)
27
27
28
28
## 🛠️ Train Your Own Agents & Environments
29
-
rLLM is designed to be extensible. You can easily build and train your own custom agents and environments using our modular API and training engine. Walk through our [core concepts](./core-concepts/overview.md) and [examples](https://github.com/agentica-project/rllm/tree/main/examples) to understand the fundamentals of rLLM and build your own custom agents and environments tailored to your specific use cases.
29
+
rLLM is designed to be extensible. You can easily build and train your own custom agents and environments using our modular API and training engine. Walk through our [core concepts](./core-concepts/overview.md) and [examples](https://github.com/rllm-org/rllm/tree/main/examples) to understand the fundamentals of rLLM and build your own custom agents and environments tailored to your specific use cases.
[](https://huggingface.co/agentica-org)
0 commit comments