Skip to content

Commit 4a96688

Browse files
authored
auto apply krr recommendations (#429)
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced Kubernetes mutating webhook server enforcing resource recommendations on pods with Prometheus metrics and workload querying endpoints. - Added flexible enforcement modes configurable via environment variables and pod annotations. - Implemented background management of ReplicaSet owners and workload recommendations with automatic periodic updates. - Added utilities for patching container resources based on recommendations with validation and significant difference checks. - Provided centralized environment variable configuration for the enforcer component. - Added support for custom CA certificates for secure connections. - Included Helm chart with automated TLS certificate generation, webhook configuration, service account, roles, and ServiceMonitor for Prometheus. - **Documentation** - Added detailed README covering functionality, deployment, configuration, API usage, metrics, and troubleshooting. - **Chores** - Added Dockerfile and Python dependencies for containerization. - Added Helm packaging scripts and `.helmignore` for streamlined chart management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 96f44df commit 4a96688

23 files changed

+2015
-0
lines changed

enforcer/Dockerfile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use the official Python 3.9 slim image as the base image
2+
FROM python:3.12-slim
3+
ENV LANG=C.UTF-8
4+
ENV PYTHONDONTWRITEBYTECODE=1
5+
ENV PYTHONUNBUFFERED=1
6+
ENV PATH="/app/venv/bin:$PATH"
7+
8+
# We're installing here libexpat1, to upgrade the package to include a fix to 3 high CVEs. CVE-2024-45491,CVE-2024-45490,CVE-2024-45492
9+
RUN apt-get update \
10+
&& apt-get install -y --no-install-recommends libexpat1 \
11+
&& rm -rf /var/lib/apt/lists/*
12+
13+
# Set the working directory
14+
WORKDIR /app/enforcer
15+
16+
COPY ./*.py .
17+
COPY ./dal/ dal/
18+
COPY ./resources/ resources/
19+
COPY ./requirements.txt requirements.txt
20+
21+
22+
RUN pip install --no-cache-dir --upgrade pip
23+
# Install the project dependencies
24+
RUN python -m ensurepip --upgrade
25+
RUN pip install --no-cache-dir -r requirements.txt
26+
27+
CMD ["python", "enforcer_main.py"]

enforcer/README.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# KRR Enforcer - Kubernetes Resource Recommendation Mutation Webhook
2+
3+
A mutating webhook server that automatically enforces [KRR (Kubernetes Resource Recommender)](https://github.com/robusta-dev/krr) recommendations by patching pod resource requests and limits in real-time.
4+
5+
## Features
6+
7+
- **Automatic Resource Enforcement**: Applies KRR recommendations to pods during pod creation
8+
- **Flexible Enforcement Modes**: Support for enforce/ignore modes per workload
9+
- **REST API**: Query recommendations via HTTP endpoints
10+
11+
## Enforcement Modes
12+
13+
Enforcement can be configured globally or on a per-workload basis.
14+
15+
### Global Enforcement Mode
16+
The global default mode is configured via the `KRR_MUTATION_MODE_DEFAULT` environment variable:
17+
- `enforce` - Apply recommendations to all pods by default
18+
- `ignore` - Skip enforcement for all pods by default
19+
20+
### Per-Workload Mode
21+
You can override the default mode for specific workloads using the annotation:
22+
23+
```yaml
24+
apiVersion: apps/v1
25+
kind: Deployment
26+
metadata:
27+
name: my-app
28+
spec:
29+
template:
30+
metadata:
31+
annotations:
32+
admission.robusta.dev/krr-mutation-mode: enforce # or "ignore"
33+
```
34+
35+
**Mode Priority**: Pod annotation > Global default
36+
37+
## Webhook Failure Mode
38+
39+
The webhook uses `failurePolicy: Ignore` by default, meaning if the webhook fails, pods are created without resource optimization rather than being blocked.
40+
41+
42+
## Installation with Helm
43+
44+
### Prerequisites
45+
- Helm 3.x
46+
- Prometheus Operator (optional, for metrics collection)
47+
- Robusta UI account - used to store KRR scan results
48+
49+
### Certificate
50+
51+
- Each helm install/upgrade, a new certificate is created and deployed for the admission webhook.
52+
- <B>The certificate is set to expire after 1 year.</b>
53+
- In order to avoid certificate expiration, you must upgrade the enforcer helm release, <b>at least once a year</b>.
54+
55+
### Quick Start
56+
57+
1. **Add the helm repository** (if available):
58+
```bash
59+
helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
60+
```
61+
62+
2. **Add cluster configuration**:
63+
64+
If the enforcer is installed in the same namespace as Robusta, it will automatically detect the Robusta account settings.
65+
66+
If your Robusta UI sink token, is pulled from a secret (as described [here](https://docs.robusta.dev/master/setup-robusta/configuration-secrets.html#pulling-values-from-kubernetes-secrets)), you should add the same environement variable to the `Enforcer` pod as well.
67+
68+
If the `Enforcer` is installed on a different namespace, you can provide your Robusta account credentials using env variables:
69+
70+
Add your robusta credentials and cluster name: (`enforcer-values.yaml`)
71+
72+
```yaml
73+
additionalEnvVars:
74+
- name: CLUSTER_NAME
75+
value: my-cluster-name # should be the same as the robusta installation on this cluster
76+
- name: ROBUSTA_UI_TOKEN
77+
value: "MY ROBUSTA UI TOKEN"
78+
# - name: ROBUSTA_UI_TOKEN # or pulled from a secret
79+
# valueFrom:
80+
# secretKeyRef:
81+
# name: robusta-secrets
82+
# key: robustaSinkToken
83+
```
84+
85+
2. **Install with default settings**:
86+
```bash
87+
helm install krr-enforcer robusta/krr-enforcer -f enforcer-values.yaml
88+
```
89+
90+
### Helm values
91+
92+
| Parameter | Description | Default |
93+
|-----------|---------------------------------------------------------------------|---------|
94+
| `logLevel` | Log level (DEBUG, INFO, WARN, ERROR) | `INFO` |
95+
| `certificate` | Base64-encoded custom CA certificate - for self signed certificates | `""` |
96+
| `serviceMonitor.enabled` | Enable Prometheus ServiceMonitor | `true` |
97+
| `resources.requests.cpu` | CPU request for the enforcer pod | `100m` |
98+
| `resources.requests.memory` | Memory request for the enforcer pod | `256Mi` |
99+
100+
101+
## Running Locally
102+
103+
### Prerequisites
104+
- Python 3.9+
105+
- Access to a Kubernetes cluster
106+
- KRR recommendations data from Robusta UI
107+
108+
### Setup
109+
110+
1. **Install dependencies**:
111+
```bash
112+
pip install -r requirements.txt
113+
```
114+
115+
2. **Set environment variables**:
116+
```bash
117+
export ENFORCER_SSL_KEY_FILE="path/to/tls.key"
118+
export ENFORCER_SSL_CERT_FILE="path/to/tls.crt"
119+
export LOG_LEVEL="DEBUG"
120+
export KRR_MUTATION_MODE_DEFAULT="enforce"
121+
```
122+
123+
3. **Generate TLS certificates**:
124+
```bash
125+
# Generate private key
126+
openssl genrsa -out tls.key 2048
127+
128+
# Generate certificate signing request
129+
openssl req -new -key tls.key -out tls.csr \
130+
-subj "/CN=krr-enforcer.krr-system.svc"
131+
132+
# Generate self-signed certificate
133+
openssl x509 -req -in tls.csr -signkey tls.key -out tls.crt -days 365
134+
```
135+
136+
4. **Run the server**:
137+
```bash
138+
python enforcer_main.py
139+
```
140+
141+
The server will start on `https://localhost:8443` with the following endpoints:
142+
143+
- `POST /mutate` - Webhook endpoint for Kubernetes admission control
144+
- `GET /health` - Health check endpoint
145+
- `GET /metrics` - Prometheus metrics
146+
- `GET /recommendations/{namespace}/{kind}/{name}` - Query recommendations
147+
148+
### Local Development Tips
149+
150+
- Use `LOG_LEVEL=DEBUG` for detailed request/response logging
151+
- Test webhook locally using tools like `curl` or `httpie`
152+
- Monitor metrics at `https://localhost:8443/metrics`
153+
- Query recommendations: `GET https://localhost:8443/recommendations/default/Deployment/my-app`
154+
155+
### Testing the Webhook
156+
157+
```bash
158+
# Test health endpoint
159+
curl -k https://localhost:8443/health
160+
161+
# Test metrics endpoint
162+
curl -k https://localhost:8443/metrics
163+
164+
# Test recommendations endpoint
165+
curl -k https://localhost:8443/recommendations/default/Deployment/my-app
166+
```
167+
168+
## Metrics
169+
170+
The enforcer exposes Prometheus metrics at `/metrics`:
171+
172+
- `krr_pod_admission_mutations_total` - Total pod mutations (with `mutated` label)
173+
- `krr_replicaset_admissions_total` - Total ReplicaSet admissions (with `operation` label)
174+
- `krr_rs_owners_map_size` - Current size of the ReplicaSet owners map
175+
- `krr_admission_duration_seconds` - Duration of admission operations (with `kind` label)
176+
177+
## API Endpoints
178+
179+
### GET /recommendations/{namespace}/{kind}/{name}
180+
181+
Retrieve recommendations for a specific workload:
182+
183+
```bash
184+
curl -k https://krr-enforcer.krr-system.svc.cluster.local/recommendations/default/Deployment/my-app
185+
```
186+
187+
Response:
188+
```json
189+
{
190+
"namespace": "default",
191+
"kind": "Deployment",
192+
"name": "my-app",
193+
"containers": {
194+
"web": {
195+
"cpu": {
196+
"request": "100m",
197+
"limit": "200m"
198+
},
199+
"memory": {
200+
"request": "128Mi",
201+
"limit": "256Mi"
202+
}
203+
}
204+
}
205+
}
206+
```
207+
208+
## Troubleshooting
209+
210+
### Common Issues
211+
212+
1. **Certificate Errors**: Ensure TLS certificates are properly configured and valid
213+
2. **Permission Denied**: Verify the ServiceAccount has proper RBAC permissions
214+
3. **No Recommendations**: Check that KRR has generated recommendations and they're accessible
215+
4. **Webhook Timeout**: Increase `timeoutSeconds` in MutatingWebhookConfiguration
216+
217+
### Debug Mode
218+
219+
Enable debug logging to troubleshoot issues:
220+
221+
```bash
222+
helm upgrade krr-enforcer ./helm/krr-enforcer --set logLevel=DEBUG
223+
```
224+
225+
### Logs
226+
227+
Check enforcer logs:
228+
```bash
229+
kubectl logs -n krr-system deployment/krr-enforcer-krr-enforcer -f
230+
```

enforcer/dal/robusta_config.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from typing import List, Dict
2+
from pydantic import BaseModel
3+
4+
5+
class RobustaConfig(BaseModel):
6+
sinks_config: List[Dict[str, Dict]]
7+
global_config: dict
8+
9+
class RobustaToken(BaseModel):
10+
store_url: str
11+
api_key: str
12+
account_id: str
13+
email: str
14+
password: str

0 commit comments

Comments
 (0)