|
| 1 | +# Codebase structure |
| 2 | + |
| 3 | +Let's examine the Feast codebase. |
| 4 | +This analysis is accurate as of Feast 0.23. |
| 5 | + |
| 6 | +``` |
| 7 | +$ tree -L 1 -d |
| 8 | +. |
| 9 | +├── docs |
| 10 | +├── examples |
| 11 | +├── go |
| 12 | +├── infra |
| 13 | +├── java |
| 14 | +├── protos |
| 15 | +├── sdk |
| 16 | +└── ui |
| 17 | +``` |
| 18 | + |
| 19 | +## Python SDK |
| 20 | + |
| 21 | +The Python SDK lives in `sdk/python/feast`. |
| 22 | +The majority of Feast logic lives in these Python files: |
| 23 | +* The core Feast objects ([entities](../getting-started/concepts/entity.md), [feature views](../getting-started/concepts/feature-view.md), [data sources](../getting-started/concepts/dataset.md), etc.) are defined in their respective Python files, such as `entity.py`, `feature_view.py`, and `data_source.py`. |
| 24 | +* The `FeatureStore` class is defined in `feature_store.py` and the associated configuration object (the Python representation of the `feature_store.yaml` file) are defined in `repo_config.py`. |
| 25 | +* The CLI and other core feature store logic are defined in `cli.py` and `repo_operations.py`. |
| 26 | +* The type system that is used to manage conversion between Feast types and external typing systems is managed in `type_map.py`. |
| 27 | +* The Python feature server (the server that is started through the `feast serve` command) is defined in `feature_server.py`. |
| 28 | + |
| 29 | +There are also several important submodules: |
| 30 | +* `infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry. |
| 31 | +* `dqm/` covers data quality monitoring, such as the dataset profiler. |
| 32 | +* `diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`). |
| 33 | +* `embedded_go/` covers the Go feature server. |
| 34 | +* `ui/` contains the embedded Web UI, to be launched on the `feast ui` command. |
| 35 | + |
| 36 | +Of these submodules, `infra/` is the most important. |
| 37 | +It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), and [registry](getting-started/architecture-and-components/registry.md), as well as all of their individual implementations. |
| 38 | + |
| 39 | +``` |
| 40 | +$ tree --dirsfirst -L 1 infra |
| 41 | +infra |
| 42 | +├── contrib |
| 43 | +├── feature_servers |
| 44 | +├── materialization |
| 45 | +├── offline_stores |
| 46 | +├── online_stores |
| 47 | +├── registry |
| 48 | +├── transformation_servers |
| 49 | +├── utils |
| 50 | +├── __init__.py |
| 51 | +├── aws.py |
| 52 | +├── gcp.py |
| 53 | +├── infra_object.py |
| 54 | +├── key_encoding_utils.py |
| 55 | +├── local.py |
| 56 | +├── passthrough_provider.py |
| 57 | +└── provider.py |
| 58 | +``` |
| 59 | + |
| 60 | +The tests for the Python SDK are contained in `sdk/python/tests`. |
| 61 | +For more details, see this [overview](../how-to-guides/adding-or-reusing-tests.md#test-suite-overview) of the test suite. |
| 62 | + |
| 63 | +### Example flow: `feast apply` |
| 64 | + |
| 65 | +Let's walk through how `feast apply` works by tracking its execution across the codebase. |
| 66 | + |
| 67 | +1. All CLI commands are in `cli.py`. |
| 68 | + Most of these commands are backed by methods in `repo_operations.py`. |
| 69 | + The `feast apply` command triggers `apply_total_command`, which then calls `apply_total` in `repo_operations.py`. |
| 70 | +2. With a `FeatureStore` object (from `feature_store.py`) that is initialized based on the `feature_store.yaml` in the current working directory, `apply_total` first parses the feature repo with `parse_repo` and then calls either `FeatureStore.apply` or `FeatureStore._apply_diffs` to apply those changes to the feature store. |
| 71 | +3. Let's examine `FeatureStore.apply`. |
| 72 | + It splits the objects based on class (e.g. `Entity`, `FeatureView`, etc.) and then calls the appropriate registry method to apply or delete the object. |
| 73 | + For example, it might call `self._registry.apply_entity` to apply an entity. |
| 74 | + If the default file-based registry is used, this logic can be found in `infra/registry/registry.py`. |
| 75 | +4. Then the feature store must update its cloud infrastructure (e.g. online store tables) to match the new feature repo, so it calls `Provider.update_infra`, which can be found in `infra/provider.py`. |
| 76 | +5. Assuming the provider is a built-in provider (e.g. one of the local, GCP, or AWS providers), it will call `PassthroughProvider.update_infra` in `infra/passthrough_provider.py`. |
| 77 | +6. This delegates to the online store and batch materialization engine. |
| 78 | + For example, if the feature store is configured to use the Redis online store then the `update` method from `infra/online_stores/redis.py` will be called. |
| 79 | + And if the local materialization engine is configured then the `update` method from `infra/materialization/local_engine.py` will be called. |
| 80 | + |
| 81 | +At this point, the `feast apply` command is complete. |
| 82 | + |
| 83 | +### Example flow: `feast materialize` |
| 84 | + |
| 85 | +Let's walk through how `feast materialize` works by tracking its execution across the codebase. |
| 86 | + |
| 87 | +1. The `feast materialize` command triggers `materialize_command` in `cli.py`, which then calls `FeatureStore.materialize` from `feature_store.py`. |
| 88 | +2. This then calls `Provider.materialize_single_feature_view`, which can be found in `infra/provider.py`. |
| 89 | +3. As with `feast apply`, the provider is most likely backed by the passthrough provider, in which case `PassthroughProvider.materialize_single_feature_view` will be called. |
| 90 | +4. This delegates to the underlying batch materialization engine. |
| 91 | + Assuming that the local engine has been configured, `LocalMaterializationEngine.materialize` from `infra/materialization/local_engine.py` will be called. |
| 92 | +5. Since materialization involves reading features from the offline store and writing them to the online store, the local engine will delegate to both the offline store and online store. |
| 93 | + Specifically, it will call `OfflineStore.pull_latest_from_table_or_query` and `OnlineStore.online_write_batch`. |
| 94 | + These two calls will be routed to the offline store and online store that have been configured. |
| 95 | + |
| 96 | +### Example flow: `get_historical_features` |
| 97 | + |
| 98 | +Let's walk through how `get_historical_features` works by tracking its execution across the codebase. |
| 99 | + |
| 100 | +1. We start with `FeatureStore.get_historical_features` in `feature_store.py`. |
| 101 | + This method does some internal preparation, and then delegates the actual execution to the underlying provider by calling `Provider.get_historical_features`, which can be found in `infra/provider.py`. |
| 102 | +2. As with `feast apply`, the provider is most likely backed by the passthrough provider, in which case `PassthroughProvider.get_historical_features` will be called. |
| 103 | +3. That call simply delegates to `OfflineStore.get_historical_features`. |
| 104 | + So if the feature store is configured to use Snowflake as the offline store, `SnowflakeOfflineStore.get_historical_features` will be executed. |
| 105 | + |
| 106 | +## Java SDK |
| 107 | + |
| 108 | +The `java/` directory contains the Java serving component. |
| 109 | +See [here](https://github.com/feast-dev/feast/blob/master/java/CONTRIBUTING.md) for more details on how the repo is structured. |
| 110 | + |
| 111 | +## Go feature server |
| 112 | + |
| 113 | +The `go/` directory contains the Go feature server. |
| 114 | +Most of the files here have logic to help with reading features from the online store. |
| 115 | +Within `go/`, the `internal/feast/` directory contains most of the core logic: |
| 116 | +* `onlineserving/` covers the core serving logic. |
| 117 | +* `model/` contains the implementations of the Feast objects (entity, feature view, etc.). |
| 118 | + * For example, `entity.go` is the Go equivalent of `entity.py`. It contains a very simple Go implementation of the entity object. |
| 119 | +* `registry/` covers the registry. |
| 120 | + * Currently only the file-based registry supported (the sql-based registry is unsupported). Additionally, the file-based registry only supports a file-based registry store, not the GCS or S3 registry stores. |
| 121 | +* `onlinestore/` covers the online stores (currently only Redis and SQLite are supported). |
| 122 | + |
| 123 | +## Protobufs |
| 124 | + |
| 125 | +Feast uses [protobuf](https://github.com/protocolbuffers/protobuf) to store serialized versions of the core Feast objects. |
| 126 | +The protobuf definitions are stored in `protos/feast`. |
| 127 | + |
| 128 | +## Web UI |
| 129 | + |
| 130 | +The `ui/` directory contains the Web UI. |
| 131 | +See [here](https://github.com/feast-dev/feast/blob/master/ui/CONTRIBUTING.md) for more details on the structure of the Web UI. |
0 commit comments