Skip to content

Commit 492a90d

Browse files
docs: Codebase structure (#3050)
* Codebase structure docs Signed-off-by: Felix Wang <[email protected]> * Address code review Signed-off-by: Felix Wang <[email protected]> Signed-off-by: Felix Wang <[email protected]>
1 parent 0ed1a63 commit 492a90d

File tree

3 files changed

+134
-0
lines changed

3 files changed

+134
-0
lines changed

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ the main Feast repository:
4848
- [Feast Java Serving](#feast-java-serving)
4949
- [Feast Go Client](#feast-go-client)
5050

51+
Please see [this page](https://docs.feast.dev/reference/codebase-structure) for more details on the structure of the entire codebase.
52+
5153
## Community
5254
See [Contribution process](https://docs.feast.dev/project/contributing) and [Community](https://docs.feast.dev/community) for details on how to get more involved in the community.
5355

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959

6060
## Reference
6161

62+
* [Codebase Structure](reference/codebase-structure.md)
6263
* [Data sources](reference/data-sources/README.md)
6364
* [File](reference/data-sources/file.md)
6465
* [Snowflake](reference/data-sources/snowflake.md)

docs/reference/codebase-structure.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Codebase structure
2+
3+
Let's examine the Feast codebase.
4+
This analysis is accurate as of Feast 0.23.
5+
6+
```
7+
$ tree -L 1 -d
8+
.
9+
├── docs
10+
├── examples
11+
├── go
12+
├── infra
13+
├── java
14+
├── protos
15+
├── sdk
16+
└── ui
17+
```
18+
19+
## Python SDK
20+
21+
The Python SDK lives in `sdk/python/feast`.
22+
The majority of Feast logic lives in these Python files:
23+
* The core Feast objects ([entities](../getting-started/concepts/entity.md), [feature views](../getting-started/concepts/feature-view.md), [data sources](../getting-started/concepts/dataset.md), etc.) are defined in their respective Python files, such as `entity.py`, `feature_view.py`, and `data_source.py`.
24+
* The `FeatureStore` class is defined in `feature_store.py` and the associated configuration object (the Python representation of the `feature_store.yaml` file) are defined in `repo_config.py`.
25+
* The CLI and other core feature store logic are defined in `cli.py` and `repo_operations.py`.
26+
* The type system that is used to manage conversion between Feast types and external typing systems is managed in `type_map.py`.
27+
* The Python feature server (the server that is started through the `feast serve` command) is defined in `feature_server.py`.
28+
29+
There are also several important submodules:
30+
* `infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.
31+
* `dqm/` covers data quality monitoring, such as the dataset profiler.
32+
* `diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`).
33+
* `embedded_go/` covers the Go feature server.
34+
* `ui/` contains the embedded Web UI, to be launched on the `feast ui` command.
35+
36+
Of these submodules, `infra/` is the most important.
37+
It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), and [registry](getting-started/architecture-and-components/registry.md), as well as all of their individual implementations.
38+
39+
```
40+
$ tree --dirsfirst -L 1 infra
41+
infra
42+
├── contrib
43+
├── feature_servers
44+
├── materialization
45+
├── offline_stores
46+
├── online_stores
47+
├── registry
48+
├── transformation_servers
49+
├── utils
50+
├── __init__.py
51+
├── aws.py
52+
├── gcp.py
53+
├── infra_object.py
54+
├── key_encoding_utils.py
55+
├── local.py
56+
├── passthrough_provider.py
57+
└── provider.py
58+
```
59+
60+
The tests for the Python SDK are contained in `sdk/python/tests`.
61+
For more details, see this [overview](../how-to-guides/adding-or-reusing-tests.md#test-suite-overview) of the test suite.
62+
63+
### Example flow: `feast apply`
64+
65+
Let's walk through how `feast apply` works by tracking its execution across the codebase.
66+
67+
1. All CLI commands are in `cli.py`.
68+
Most of these commands are backed by methods in `repo_operations.py`.
69+
The `feast apply` command triggers `apply_total_command`, which then calls `apply_total` in `repo_operations.py`.
70+
2. With a `FeatureStore` object (from `feature_store.py`) that is initialized based on the `feature_store.yaml` in the current working directory, `apply_total` first parses the feature repo with `parse_repo` and then calls either `FeatureStore.apply` or `FeatureStore._apply_diffs` to apply those changes to the feature store.
71+
3. Let's examine `FeatureStore.apply`.
72+
It splits the objects based on class (e.g. `Entity`, `FeatureView`, etc.) and then calls the appropriate registry method to apply or delete the object.
73+
For example, it might call `self._registry.apply_entity` to apply an entity.
74+
If the default file-based registry is used, this logic can be found in `infra/registry/registry.py`.
75+
4. Then the feature store must update its cloud infrastructure (e.g. online store tables) to match the new feature repo, so it calls `Provider.update_infra`, which can be found in `infra/provider.py`.
76+
5. Assuming the provider is a built-in provider (e.g. one of the local, GCP, or AWS providers), it will call `PassthroughProvider.update_infra` in `infra/passthrough_provider.py`.
77+
6. This delegates to the online store and batch materialization engine.
78+
For example, if the feature store is configured to use the Redis online store then the `update` method from `infra/online_stores/redis.py` will be called.
79+
And if the local materialization engine is configured then the `update` method from `infra/materialization/local_engine.py` will be called.
80+
81+
At this point, the `feast apply` command is complete.
82+
83+
### Example flow: `feast materialize`
84+
85+
Let's walk through how `feast materialize` works by tracking its execution across the codebase.
86+
87+
1. The `feast materialize` command triggers `materialize_command` in `cli.py`, which then calls `FeatureStore.materialize` from `feature_store.py`.
88+
2. This then calls `Provider.materialize_single_feature_view`, which can be found in `infra/provider.py`.
89+
3. As with `feast apply`, the provider is most likely backed by the passthrough provider, in which case `PassthroughProvider.materialize_single_feature_view` will be called.
90+
4. This delegates to the underlying batch materialization engine.
91+
Assuming that the local engine has been configured, `LocalMaterializationEngine.materialize` from `infra/materialization/local_engine.py` will be called.
92+
5. Since materialization involves reading features from the offline store and writing them to the online store, the local engine will delegate to both the offline store and online store.
93+
Specifically, it will call `OfflineStore.pull_latest_from_table_or_query` and `OnlineStore.online_write_batch`.
94+
These two calls will be routed to the offline store and online store that have been configured.
95+
96+
### Example flow: `get_historical_features`
97+
98+
Let's walk through how `get_historical_features` works by tracking its execution across the codebase.
99+
100+
1. We start with `FeatureStore.get_historical_features` in `feature_store.py`.
101+
This method does some internal preparation, and then delegates the actual execution to the underlying provider by calling `Provider.get_historical_features`, which can be found in `infra/provider.py`.
102+
2. As with `feast apply`, the provider is most likely backed by the passthrough provider, in which case `PassthroughProvider.get_historical_features` will be called.
103+
3. That call simply delegates to `OfflineStore.get_historical_features`.
104+
So if the feature store is configured to use Snowflake as the offline store, `SnowflakeOfflineStore.get_historical_features` will be executed.
105+
106+
## Java SDK
107+
108+
The `java/` directory contains the Java serving component.
109+
See [here](https://github.com/feast-dev/feast/blob/master/java/CONTRIBUTING.md) for more details on how the repo is structured.
110+
111+
## Go feature server
112+
113+
The `go/` directory contains the Go feature server.
114+
Most of the files here have logic to help with reading features from the online store.
115+
Within `go/`, the `internal/feast/` directory contains most of the core logic:
116+
* `onlineserving/` covers the core serving logic.
117+
* `model/` contains the implementations of the Feast objects (entity, feature view, etc.).
118+
* For example, `entity.go` is the Go equivalent of `entity.py`. It contains a very simple Go implementation of the entity object.
119+
* `registry/` covers the registry.
120+
* Currently only the file-based registry supported (the sql-based registry is unsupported). Additionally, the file-based registry only supports a file-based registry store, not the GCS or S3 registry stores.
121+
* `onlinestore/` covers the online stores (currently only Redis and SQLite are supported).
122+
123+
## Protobufs
124+
125+
Feast uses [protobuf](https://github.com/protocolbuffers/protobuf) to store serialized versions of the core Feast objects.
126+
The protobuf definitions are stored in `protos/feast`.
127+
128+
## Web UI
129+
130+
The `ui/` directory contains the Web UI.
131+
See [here](https://github.com/feast-dev/feast/blob/master/ui/CONTRIBUTING.md) for more details on the structure of the Web UI.

0 commit comments

Comments
 (0)