You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -100,116 +100,90 @@ Whether you're interested in AI, automation, or contributing to cutting-edge age
100
100
101
101
102
102
## π οΈ Installation & Setup
103
-
> **Note**: Our agent returns `pyautogui` code and is intended for a single monitor screen.
104
103
105
-
> β**Warning**β: If you are on a Linux machine, creating a `conda` environment will interfere with `pyatspi`. As of now, there's no clean solution for this issue. Proceed through the installation without using `conda` or any virtual environment.
104
+
### Prerequisites
105
+
-**Single Monitor**: Our agent is designed for single monitor screens
106
+
-**Linux Users**: Avoid `conda` environments as they interfere with `pyatspi`
107
+
-**Security**: The agent runs Python code to control your computer - use with care
106
108
107
-
> β οΈ**Disclaimer**β οΈ: To leverage the full potential of Agent S2, we utilize [UI-TARS](https://github.com/bytedance/UI-TARS) as a grounding model (7B-DPO or 72B-DPO for better performance). They can be hosted locally, or on Hugging Face Inference Endpoints. Our code supports Hugging Face Inference Endpoints. Check out [Hugging Face Inference Endpoints](https://huggingface.co/learn/cookbook/en/enterprise_dedicated_endpoints) for more information on how to set up and query this endpoint. However, running Agent S2 does not require this model, and you can use alternative API based models for visual grounding, such as Claude.
108
-
109
-
Install the package:
110
-
```
109
+
### Installation
110
+
```bash
111
111
pip install gui-agents
112
112
```
113
113
114
-
Set your LLM API Keys and other environment variables. You can do this by adding the following line to your .bashrc (Linux), or .zshrc (MacOS) file.
114
+
### API Configuration
115
115
116
-
```
116
+
#### Option 1: Environment Variables
117
+
Add to your `.bashrc` (Linux) or `.zshrc` (MacOS):
118
+
```bash
117
119
export OPENAI_API_KEY=<YOUR_API_KEY>
118
120
export ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>
119
121
export HF_TOKEN=<YOUR_HF_TOKEN>
120
122
```
121
123
122
-
Alternatively, you can set the environment variable in your Python script:
123
-
124
-
```
124
+
#### Option 2: Python Script
125
+
```python
125
126
import os
126
127
os.environ["OPENAI_API_KEY"] ="<YOUR_API_KEY>"
127
128
```
128
129
129
-
We also support Azure OpenAI, Anthropic, Gemini, Open Router, and vLLM inference. For more information refer to [models.md](models.md).
130
+
### Supported Models
131
+
We support Azure OpenAI, Anthropic, Gemini, Open Router, and vLLM inference. See [models.md](models.md) for details.
130
132
131
-
> β**Warning**β: The agent will directly run python code to control your computer. Please use with care.
133
+
### Grounding Models (Required)
134
+
For optimal performance, we recommend [UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) hosted on Hugging Face Inference Endpoints or another provider. See [Hugging Face Inference Endpoints](https://huggingface.co/learn/cookbook/en/enterprise_dedicated_endpoints) for setup instructions.
132
135
133
136
## π Usage
134
137
135
138
136
-
> **Note**: Our best configuration uses o3 and UI-TARS-1.5-7B.
137
-
138
-
### CLI
139
+
> β‘οΈ **Recommended Setup:**
140
+
> For the best configuration, we recommend using **OpenAI o3-2025-04-16** as the main model, paired with **UI-TARS-1.5-7B** for grounding.
139
141
140
-
Run Agent S2 with a specific model (default is `gpt-4o`):
141
142
142
-
```sh
143
-
agent_s2 \
144
-
--provider "anthropic" \
145
-
--model "claude-3-7-sonnet-20250219" \
146
-
--grounding_model_provider "anthropic" \
147
-
--grounding_model "claude-3-7-sonnet-20250219" \
148
-
```
143
+
### CLI
149
144
150
-
Or use a custom endpoint:
145
+
Run Agent S2.5 with the required parameters:
151
146
152
147
```bash
153
-
agent_s2 \
154
-
--provider "anthropic" \
155
-
--model "claude-3-7-sonnet-20250219" \
156
-
--endpoint_provider "huggingface" \
157
-
--endpoint_url "<endpoint_url>/v1/"
148
+
agent_s \
149
+
--provider openai \
150
+
--model o3-2025-04-16 \
151
+
--ground_provider huggingface \
152
+
--ground_url http://localhost:8080 \
153
+
--ground_model ui-tars-1.5-7b \
154
+
--grounding_width 1920 \
155
+
--grounding_height 1080
158
156
```
159
157
160
-
#### Main Model Settings
161
-
-**`--provider`**, **`--model`**
162
-
- Purpose: Specifies the main generation model
163
-
- Supports: all model providers in [models.md](models.md)
- Purpose: Some API providers automatically rescale images. Therefore, the generated (x, y) will be relative to the rescaled image dimensions, instead of the original image dimensions.
- Note: This is optional. If not specified, `gui-agents` will default to your environment variables for the API key.
199
-
- Default: None
200
-
201
-
> **Note**: Configuration 2 takes precedence over Configuration 1.
202
-
203
-
This will show a user query prompt where you can enter your query and interact with Agent S2. You can use any model from the list of supported models in [models.md](models.md).
158
+
#### Required Parameters
159
+
-**`--provider`**: Main generation model provider (e.g., openai, anthropic, etc.) - Default: "openai"
160
+
-**`--model`**: Main generation model name (e.g., o3-2025-04-16) - Default: "o3-2025-04-16"
161
+
-**`--ground_provider`**: The provider for the grounding model - **Required**
162
+
-**`--ground_url`**: The URL of the grounding model - **Required**
163
+
-**`--ground_model`**: The model name for the grounding model - **Required**
164
+
-**`--grounding_width`**: Width of the output coordinate resolution from the grounding model - **Required**
165
+
-**`--grounding_height`**: Height of the output coordinate resolution from the grounding model - **Required**
166
+
167
+
#### Grounding Model Dimensions
168
+
The grounding width and height should match the output coordinate resolution of your grounding model:
169
+
-**UI-TARS-1.5-7B**: Use `--grounding_width 1920 --grounding_height 1080`
170
+
-**UI-TARS-72B**: Use `--grounding_width 1000 --grounding_height 1000`
171
+
172
+
#### Optional Parameters
173
+
-**`--model_url`**: Custom API URL for main generation model - Default: ""
174
+
-**`--model_api_key`**: API key for main generation model - Default: ""
175
+
-**`--ground_api_key`**: API key for grounding model endpoint - Default: ""
176
+
-**`--max_trajectory_length`**: Maximum number of image turns to keep in trajectory - Default: 8
177
+
-**`--enable_reflection`**: Enable reflection agent to assist the worker agent - Default: True
204
178
205
179
### `gui_agents` SDK
206
180
207
-
First, we import the necessary modules. `AgentS2` is the main agent class for Agent S2. `OSWorldACI` is our grounding agent that translates agent actions into executable python code.
181
+
First, we import the necessary modules. `AgentS2_5` is the main agent class for Agent S2.5. `OSWorldACI` is our grounding agent that translates agent actions into executable python code.
208
182
```python
209
183
import pyautogui
210
184
import io
211
-
from gui_agents.s2.agents.agent_s importAgentS2
212
-
from gui_agents.s2.agents.grounding import OSWorldACI
185
+
from gui_agents.s2_5.agents.agent_s importAgentS2_5
186
+
from gui_agents.s2_5.agents.grounding import OSWorldACI
213
187
214
188
# Load in your API keys.
215
189
from dotenv import load_dotenv
@@ -218,7 +192,7 @@ load_dotenv()
218
192
current_platform ="linux"# "darwin", "windows"
219
193
```
220
194
221
-
Next, we define our engine parameters. `engine_params` is used for the main agent, and `engine_params_for_grounding` is for grounding. For `engine_params_for_grounding`, we support the Claude, GPT series, and Hugging Face Inference Endpoints.
195
+
Next, we define our engine parameters. `engine_params` is used for the main agent, and `engine_params_for_grounding` is for grounding. For `engine_params_for_grounding`, we support custom endpoints like HuggingFace TGI, vLLM, and Open Router.
222
196
223
197
```python
224
198
engine_params = {
@@ -228,50 +202,45 @@ engine_params = {
228
202
"api_key": model_api_key, # Optional
229
203
}
230
204
231
-
#Grounding Configuration 1: Load the grounding engine from an API based model
0 commit comments