Skip to content

Commit dd3061f

Browse files
authored
feat: Added global search api and necessary unit tests (#5532)
* feat: Added global search api and necessary unit tests Signed-off-by: Aniket Paluskar <[email protected]> * Minor refactoring Signed-off-by: Aniket Paluskar <[email protected]> * Addressed comments, create re-usable function to get all resources, removed unnecessary filtering logic for global search api Signed-off-by: Aniket Paluskar <[email protected]> * Re-usable & modular dependency functions, removal of unnecessary fields, updated & added test cases, documentation update Signed-off-by: Aniket Paluskar <[email protected]> * Minor refactoring & optimzations Signed-off-by: Aniket Paluskar <[email protected]> * Minor reformatting, optimized unit tests to do more comprehensive tests Signed-off-by: Aniket Paluskar <[email protected]> * Fixed minor linting error Signed-off-by: Aniket Paluskar <[email protected]> * Optimized Code, better error handling, created reusable functions and combined test cases Signed-off-by: Aniket Paluskar <[email protected]> * Minor formatting & type lints related changes Signed-off-by: Aniket Paluskar <[email protected]> * Added exact similarity score in response, updated docs Signed-off-by: Aniket Paluskar <[email protected]> * Minor reformatting & fixed lint error Signed-off-by: Aniket Paluskar <[email protected]> * Added onDemandFeatureView test cases, minor function naming changes and increased fuzzy match threshold Signed-off-by: Aniket Paluskar <[email protected]> * Minor code change after rebase Signed-off-by: Aniket Paluskar <[email protected]> --------- Signed-off-by: Aniket Paluskar <[email protected]>
1 parent 72de088 commit dd3061f

File tree

7 files changed

+3326
-109
lines changed

7 files changed

+3326
-109
lines changed

docs/reference/feature-servers/registry-server.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1153,6 +1153,157 @@ Please refer the [page](./../../../docs/getting-started/concepts/permission.md)
11531153

11541154
**Note**: Recent visits are automatically logged when users access registry objects via the REST API. The logging behavior can be configured through the `feature_server.recent_visit_logging` section in `feature_store.yaml` (see configuration section below).
11551155

1156+
1157+
### Search API
1158+
1159+
#### Search Resources
1160+
- **Endpoint**: `GET /api/v1/search`
1161+
- **Description**: Search across all Feast resources including entities, feature views, features, feature services, data sources, and saved datasets. Supports cross-project search, fuzzy matching, relevance scoring, and advanced filtering.
1162+
- **Parameters**:
1163+
- `query` (required): Search query string. Searches in resource names, descriptions, and tags. Empty string returns all resources.
1164+
- `projects` (optional): List of project names to search in. If not specified, searches all projects
1165+
- `allow_cache` (optional, default: `true`): Whether to allow cached data
1166+
- `tags` (optional): Filter results by tags in key:value format (e.g., `tags=environment:production&tags=team:ml`)
1167+
- `page` (optional, default: `1`): Page number for pagination (starts from 1)
1168+
- `limit` (optional, default: `50`, max: `100`): Number of items per page
1169+
- `sort_by` (optional, default: `match_score`): Field to sort by (`match_score`, `name`, or `type`)
1170+
- `sort_order` (optional, default: `desc`): Sort order ("asc" or "desc")
1171+
- **Search Algorithm**:
1172+
- **Exact name match**: Highest priority (score: 100)
1173+
- **Description match**: High priority (score: 80)
1174+
- **Feature name match**: Medium-high priority (score: 50)
1175+
- **Tag match**: Medium priority (score: 60)
1176+
- **Fuzzy name match**: Lower priority (score: 40, similarity threshold: 50%)
1177+
- **Examples**:
1178+
```bash
1179+
# Basic search across all projects
1180+
curl -H "Authorization: Bearer <token>" \
1181+
"http://localhost:6572/api/v1/search?query=user"
1182+
1183+
# Search in specific projects
1184+
curl -H "Authorization: Bearer <token>" \
1185+
"http://localhost:6572/api/v1/search?query=driver&projects=ride_sharing&projects=analytics"
1186+
1187+
# Search with tag filtering
1188+
curl -H "Authorization: Bearer <token>" \
1189+
"http://localhost:6572/api/v1/search?query=features&tags=environment:production&tags=team:ml"
1190+
1191+
# Search with pagination and sorting
1192+
curl -H "Authorization: Bearer <token>" \
1193+
"http://localhost:6572/api/v1/search?query=conv_rate&page=1&limit=10&sort_by=name&sort_order=asc"
1194+
1195+
# Empty query to list all resources with filtering
1196+
curl -H "Authorization: Bearer <token>" \
1197+
"http://localhost:6572/api/v1/search?query=&projects=my_project&page=1&limit=20"
1198+
```
1199+
- **Response Example**:
1200+
```json
1201+
{
1202+
"query": "user",
1203+
"projects_searched": ["project1", "project2"],
1204+
"results": [
1205+
{
1206+
"type": "entity",
1207+
"name": "user_id",
1208+
"description": "Primary identifier for users",
1209+
"project": "project1",
1210+
"match_score": 100
1211+
},
1212+
{
1213+
"type": "featureView",
1214+
"name": "user_features",
1215+
"description": "User demographic and behavioral features",
1216+
"project": "project1",
1217+
"match_score": 100
1218+
},
1219+
{
1220+
"type": "feature",
1221+
"name": "user_age",
1222+
"description": "Age of the user in years",
1223+
"project": "project1",
1224+
"match_score": 80
1225+
},
1226+
{
1227+
"type": "dataSource",
1228+
"name": "user_analytics",
1229+
"description": "Analytics data for user behavior tracking",
1230+
"project": "project2",
1231+
"match_score": 80
1232+
}
1233+
],
1234+
"pagination": {
1235+
"page": 1,
1236+
"limit": 50,
1237+
"totalCount": 4,
1238+
"totalPages": 1,
1239+
"hasNext": false,
1240+
"hasPrevious": false
1241+
},
1242+
"errors": []
1243+
}
1244+
```
1245+
- **Project Handling**:
1246+
- **No projects specified**: Searches all available projects
1247+
- **Single project**: Searches only that project (includes warning if project doesn't exist)
1248+
- **Multiple projects**: Searches only existing projects, includes warnings about non-existent ones
1249+
- **Empty projects list**: Treated as search all projects
1250+
- **Error Responses**:
1251+
```json
1252+
// Invalid sort_by parameter (HTTP 400)
1253+
{
1254+
"detail": "Invalid sort_by parameter: 'invalid_field'. Valid options are: ['match_score', 'name', 'type']"
1255+
}
1256+
1257+
// Invalid sort_order parameter (HTTP 400)
1258+
{
1259+
"detail": "Invalid sort_order parameter: 'invalid_order'. Valid options are: ['asc', 'desc']"
1260+
}
1261+
1262+
// Invalid pagination limit above maximum (HTTP 400)
1263+
{
1264+
"detail": "Invalid limit parameter: '150'. Must be less than or equal to 100"
1265+
}
1266+
1267+
// Missing required query parameter (HTTP 422)
1268+
{
1269+
"detail": [
1270+
{
1271+
"type": "missing",
1272+
"loc": ["query_params", "query"],
1273+
"msg": "Field required"
1274+
}
1275+
]
1276+
}
1277+
1278+
// Successful response with warnings
1279+
{
1280+
"query": "user",
1281+
"projects_searched": ["existing_project"],
1282+
"results": [],
1283+
"pagination": {
1284+
"page": 1,
1285+
"limit": 50,
1286+
"totalCount": 0,
1287+
"totalPages": 0
1288+
},
1289+
"errors": ["Following projects do not exist: nonexistent_project"]
1290+
}
1291+
1292+
// Successful response but empty results
1293+
{
1294+
"query": "user",
1295+
"projects_searched": ["existing_project"],
1296+
"results": [],
1297+
"pagination": {
1298+
"page": 1,
1299+
"limit": 50,
1300+
"totalCount": 0,
1301+
"totalPages": 0
1302+
},
1303+
"errors": []
1304+
}
1305+
```
1306+
---
11561307
#### Get Popular Tags
11571308
- **Endpoint**: `GET /api/v1/metrics/popular_tags`
11581309
- **Description**: Discover Feature Views by popular tags. Returns the most popular tags (tags assigned to maximum number of feature views) with their associated feature views. If no project is specified, returns popular tags across all projects.

sdk/python/feast/api/registry/rest/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from feast.api.registry.rest.permissions import get_permission_router
1111
from feast.api.registry.rest.projects import get_project_router
1212
from feast.api.registry.rest.saved_datasets import get_saved_dataset_router
13+
from feast.api.registry.rest.search import get_search_router
1314

1415

1516
def register_all_routes(app: FastAPI, grpc_handler, server=None):
@@ -22,4 +23,5 @@ def register_all_routes(app: FastAPI, grpc_handler, server=None):
2223
app.include_router(get_permission_router(grpc_handler))
2324
app.include_router(get_project_router(grpc_handler))
2425
app.include_router(get_saved_dataset_router(grpc_handler))
26+
app.include_router(get_search_router(grpc_handler))
2527
app.include_router(get_metrics_router(grpc_handler, server))

sdk/python/feast/api/registry/rest/lineage.py

Lines changed: 53 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,22 @@
11
"""REST API endpoints for registry lineage and relationships."""
22

3+
import logging
34
from typing import Optional
45

56
from fastapi import APIRouter, Depends, Query
67

78
from feast.api.registry.rest.rest_utils import (
89
create_grpc_pagination_params,
910
create_grpc_sorting_params,
11+
get_all_project_resources,
1012
get_pagination_params,
1113
get_sorting_params,
1214
grpc_call,
1315
)
1416
from feast.protos.feast.registry import RegistryServer_pb2
1517

18+
logger = logging.getLogger(__name__)
19+
1620

1721
def get_lineage_router(grpc_handler) -> APIRouter:
1822
router = APIRouter()
@@ -141,69 +145,44 @@ def get_complete_registry_data(
141145
)
142146
lineage_response = grpc_call(grpc_handler.GetRegistryLineage, lineage_req)
143147

144-
# Get all registry objects
145-
entities_req = RegistryServer_pb2.ListEntitiesRequest(
146-
project=project,
147-
allow_cache=allow_cache,
148-
pagination=grpc_pagination,
149-
sorting=grpc_sorting,
150-
)
151-
entities_response = grpc_call(grpc_handler.ListEntities, entities_req)
152-
153-
data_sources_req = RegistryServer_pb2.ListDataSourcesRequest(
154-
project=project,
155-
allow_cache=allow_cache,
156-
pagination=grpc_pagination,
157-
sorting=grpc_sorting,
158-
)
159-
data_sources_response = grpc_call(
160-
grpc_handler.ListDataSources, data_sources_req
161-
)
162-
163-
feature_views_req = RegistryServer_pb2.ListAllFeatureViewsRequest(
164-
project=project,
165-
allow_cache=allow_cache,
166-
pagination=grpc_pagination,
167-
sorting=grpc_sorting,
168-
)
169-
feature_views_response = grpc_call(
170-
grpc_handler.ListAllFeatureViews, feature_views_req
171-
)
172-
173-
feature_services_req = RegistryServer_pb2.ListFeatureServicesRequest(
174-
project=project,
175-
allow_cache=allow_cache,
176-
pagination=grpc_pagination,
177-
sorting=grpc_sorting,
178-
)
179-
feature_services_response = grpc_call(
180-
grpc_handler.ListFeatureServices, feature_services_req
181-
)
182-
183-
features_req = RegistryServer_pb2.ListFeaturesRequest(
184-
project=project,
185-
pagination=grpc_pagination,
186-
sorting=grpc_sorting,
148+
# Get all registry objects using shared helper function
149+
project_resources, pagination, errors = get_all_project_resources(
150+
grpc_handler,
151+
project,
152+
allow_cache,
153+
tags={},
154+
pagination_params=pagination_params,
155+
sorting_params=sorting_params,
187156
)
188-
features_response = grpc_call(grpc_handler.ListFeatures, features_req)
189-
157+
if errors and not project_resources:
158+
logger.error(
159+
f"Error getting project resources for project {project}: {errors}"
160+
)
161+
return {
162+
"project": project,
163+
"objects": {},
164+
"relationships": [],
165+
"indirectRelationships": [],
166+
"pagination": {},
167+
}
190168
return {
191169
"project": project,
192170
"objects": {
193-
"entities": entities_response.get("entities", []),
194-
"dataSources": data_sources_response.get("dataSources", []),
195-
"featureViews": feature_views_response.get("featureViews", []),
196-
"featureServices": feature_services_response.get("featureServices", []),
197-
"features": features_response.get("features", []),
171+
"entities": project_resources.get("entities", []),
172+
"dataSources": project_resources.get("dataSources", []),
173+
"featureViews": project_resources.get("featureViews", []),
174+
"featureServices": project_resources.get("featureServices", []),
175+
"features": project_resources.get("features", []),
198176
},
199177
"relationships": lineage_response.get("relationships", []),
200178
"indirectRelationships": lineage_response.get("indirectRelationships", []),
201179
"pagination": {
202-
"entities": entities_response.get("pagination", {}),
203-
"dataSources": data_sources_response.get("pagination", {}),
204-
"featureViews": feature_views_response.get("pagination", {}),
205-
"featureServices": feature_services_response.get("pagination", {}),
206-
"features": features_response.get("pagination", {}),
180+
# Get pagination metadata from project_resources if available, otherwise use empty dicts
181+
"entities": pagination.get("entities", {}),
182+
"dataSources": pagination.get("dataSources", {}),
183+
"featureViews": pagination.get("featureViews", {}),
184+
"featureServices": pagination.get("featureServices", {}),
185+
"features": pagination.get("features", {}),
207186
"relationships": lineage_response.get("relationshipsPagination", {}),
208187
"indirectRelationships": lineage_response.get(
209188
"indirectRelationshipsPagination", {}
@@ -265,61 +244,38 @@ def get_complete_registry_data_all(
265244
allow_cache=allow_cache,
266245
)
267246
lineage_response = grpc_call(grpc_handler.GetRegistryLineage, lineage_req)
268-
# Get all registry objects
269-
entities_req = RegistryServer_pb2.ListEntitiesRequest(
270-
project=project_name,
271-
allow_cache=allow_cache,
272-
)
273-
entities_response = grpc_call(grpc_handler.ListEntities, entities_req)
274-
data_sources_req = RegistryServer_pb2.ListDataSourcesRequest(
275-
project=project_name,
276-
allow_cache=allow_cache,
277-
)
278-
data_sources_response = grpc_call(
279-
grpc_handler.ListDataSources, data_sources_req
280-
)
281-
feature_views_req = RegistryServer_pb2.ListAllFeatureViewsRequest(
282-
project=project_name,
283-
allow_cache=allow_cache,
284-
)
285-
feature_views_response = grpc_call(
286-
grpc_handler.ListAllFeatureViews, feature_views_req
287-
)
288-
feature_services_req = RegistryServer_pb2.ListFeatureServicesRequest(
289-
project=project_name,
290-
allow_cache=allow_cache,
291-
)
292-
feature_services_response = grpc_call(
293-
grpc_handler.ListFeatureServices, feature_services_req
294-
)
295247

296-
features_req = RegistryServer_pb2.ListFeaturesRequest(
297-
project=project_name,
248+
# Get all registry objects using shared helper function
249+
project_resources, _, errors = get_all_project_resources(
250+
grpc_handler, project_name, allow_cache, tags={}
298251
)
299-
features_response = grpc_call(grpc_handler.ListFeatures, features_req)
252+
253+
if errors and not project_resources:
254+
logger.error(
255+
f"Error getting project resources for project {project_name}: {errors}"
256+
)
257+
continue
300258

301259
# Add project field to each object
302-
for entity in entities_response.get("entities", []):
260+
for entity in project_resources.get("entities", []):
303261
entity["project"] = project_name
304-
for ds in data_sources_response.get("dataSources", []):
262+
for ds in project_resources.get("dataSources", []):
305263
ds["project"] = project_name
306-
for fv in feature_views_response.get("featureViews", []):
264+
for fv in project_resources.get("featureViews", []):
307265
fv["project"] = project_name
308-
for fs in feature_services_response.get("featureServices", []):
266+
for fs in project_resources.get("featureServices", []):
309267
fs["project"] = project_name
310-
for feat in features_response.get("features", []):
268+
for feat in project_resources.get("features", []):
311269
feat["project"] = project_name
312270
all_data.append(
313271
{
314272
"project": project_name,
315273
"objects": {
316-
"entities": entities_response.get("entities", []),
317-
"dataSources": data_sources_response.get("dataSources", []),
318-
"featureViews": feature_views_response.get("featureViews", []),
319-
"featureServices": feature_services_response.get(
320-
"featureServices", []
321-
),
322-
"features": features_response.get("features", []),
274+
"entities": project_resources.get("entities", []),
275+
"dataSources": project_resources.get("dataSources", []),
276+
"featureViews": project_resources.get("featureViews", []),
277+
"featureServices": project_resources.get("featureServices", []),
278+
"features": project_resources.get("features", []),
323279
},
324280
"relationships": lineage_response.get("relationships", []),
325281
"indirectRelationships": lineage_response.get(

0 commit comments

Comments
 (0)