Skip to content

Commit 792751e

Browse files
authored
docs: Add docs for SQL Registry (#2801)
* docs: Add docs for SQL Registry Signed-off-by: Achal Shah <[email protected]> * typo Signed-off-by: Achal Shah <[email protected]>
1 parent 1bd0930 commit 792751e

File tree

6 files changed

+54
-3
lines changed

6 files changed

+54
-3
lines changed

docs/SUMMARY.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@
1111
* [Concepts](getting-started/concepts/README.md)
1212
* [Overview](getting-started/concepts/overview.md)
1313
* [Data source](getting-started/concepts/data-source.md)
14+
* [Dataset](getting-started/concepts/dataset.md)
1415
* [Entity](getting-started/concepts/entity.md)
1516
* [Feature view](getting-started/concepts/feature-view.md)
1617
* [Feature retrieval](getting-started/concepts/feature-retrieval.md)
1718
* [Point-in-time joins](getting-started/concepts/point-in-time-joins.md)
18-
* [Dataset](getting-started/concepts/dataset.md)
19+
* [Registry](getting-started/concepts/registry.md)
1920
* [Architecture](getting-started/architecture-and-components/README.md)
2021
* [Overview](getting-started/architecture-and-components/overview.md)
2122
* [Feature repository](getting-started/architecture-and-components/feature-repository.md)
@@ -35,6 +36,7 @@
3536
* [Real-time credit scoring on AWS](tutorials/real-time-credit-scoring-on-aws.md)
3637
* [Driver stats on Snowflake](tutorials/driver-stats-on-snowflake.md)
3738
* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
39+
* [Using Scalable Registry](tutorials/using-scalable-registry.md)
3840

3941
## How-to Guides
4042

docs/getting-started/concepts/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
{% page-ref page="data-source.md" %}
66

7+
{% page-ref page="dataset.md" %}
8+
79
{% page-ref page="entity.md" %}
810

911
{% page-ref page="feature-view.md" %}
@@ -12,4 +14,4 @@
1214

1315
{% page-ref page="point-in-time-joins.md" %}
1416

15-
{% page-ref page="dataset.md" %}
17+
{% page-ref page="registry.md" %}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Registry
2+
3+
The Feast registry is where all applied Feast objects (e.g. Feature views, entities, etc) are stored. The registry exposes methods to apply, list, retrieve and delete these objects. The registry is abstraction, with multiple possible implementations.
4+
5+
By default, the registry Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS).
6+
7+
However, there's inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).
8+
9+
Alternatively, a [SQL Registry](../../tutorials/using-scalable-registry.md) can be used for a more scalable registry.

docs/tutorials/tutorials-overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,5 @@ These Feast tutorials showcase how to use Feast to simplify end to end model tra
1111
{% page-ref page="driver-stats-on-snowflake.md" %}
1212

1313
{% page-ref page="validating-historical-features.md" %}
14+
15+
{% page-ref page="using-scalable-registry.md" %}
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
description: >-
3+
Tutorial on how to use the SQL registry for scalable registry updates
4+
---
5+
6+
# Using Scalable Registry
7+
8+
## Overview
9+
10+
By default, the registry Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS).
11+
12+
However, there's inherent limitations with a file-based registry, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently).
13+
14+
An alternative to the file-based registry is the [SQLRegistry](https://rtd.feast.dev/en/latest/feast.infra.registry_stores.html#feast.infra.registry_stores.sql.SqlRegistry) which ships with Feast. This implementation stores the registry in a relational database, and allows for changes to individual objects atomically.
15+
Under the hood, the SQL Registry implementation uses [SQLAlchemy](https://docs.sqlalchemy.org/en/14/) to abstract over the different databases. Consequently, any [database supported](https://docs.sqlalchemy.org/en/14/core/engines.html#supported-databases) by SQLAlchemy can be used by the SQL Registry.
16+
Feast can use the SQL Registry via a config change in the feature_store.yaml file. An example of how to configure this would be:
17+
18+
```yaml
19+
project: <your project name>
20+
provider: <provider name>
21+
online_store: redis
22+
offline_store: file
23+
registry:
24+
registry_type: sql
25+
path: postgresql://postgres:[email protected]:55001/feast
26+
```
27+
28+
Specifically, the registry_type needs to be set to sql in the registry config block. On doing so, the path should refer to the [Database URL](https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls) for the database to be used, as expected by SQLAlchemy. No other additional commands are currently needed to configure this registry.
29+
30+
There are some things to note about how the SQL registry works:
31+
- Once instantiated, the Registry ensures the tables needed to store data exist, and creates them if they do not.
32+
- Upon tearing down the feast project, the registry ensures that the tables are dropped from the database.
33+
- The schema for how data is laid out in tables can be found . It is intentionally simple, storing the serialized protobuf versions of each Feast object keyed by its name.
34+
35+
## Example Usage: Concurrent materialization
36+
The SQL Registry should be used when materializing feature views concurrently to ensure correctness of data in the registry. This can be achieved by simply running feast materialize or feature_store.materialize multiple times using a correctly configured feature_store.yaml. This will make each materialization process talk to the registry database concurrently, and ensure the metadata updates are serialized.

sdk/python/feast/infra/offline_stores/file.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -447,7 +447,7 @@ def _field_mapping(
447447
entity_df_event_timestamp_col: str,
448448
timestamp_field: str,
449449
full_feature_names: bool,
450-
) -> dd.DataFrame:
450+
) -> Tuple[dd.DataFrame, str]:
451451
# Rename columns by the field mapping dictionary if it exists
452452
if feature_view.batch_source.field_mapping:
453453
df_to_join = _run_dask_field_mapping(

0 commit comments

Comments
 (0)