Skip to content

Commit 0ff79e5

Browse files
kevjumbaadchia
authored andcommitted
chore: Update integration testing documentation (#2983)
* Refactor go feature server Signed-off-by: Kevin Zhang <[email protected]> * Fix lint Signed-off-by: Kevin Zhang <[email protected]> * Fix e2e tests Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Verify tests Signed-off-by: Kevin Zhang <[email protected]> * Fix lint Signed-off-by: Kevin Zhang <[email protected]> * Address review Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Address review Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix lint Signed-off-by: Kevin Zhang <[email protected]> * address review Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix lint Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Refactor Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fx lit Signed-off-by: Kevin Zhang <[email protected]> * Fix lint Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Revert Signed-off-by: Kevin Zhang <[email protected]> * fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Add more docs Signed-off-by: Kevin Zhang <[email protected]> * Swap Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * More thingies Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix rebase Signed-off-by: Kevin Zhang <[email protected]> * Fix rebase Signed-off-by: Kevin Zhang <[email protected]> * fix lint Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * address review Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix release Signed-off-by: Kevin Zhang <[email protected]>
1 parent 48a8275 commit 0ff79e5

File tree

4 files changed

+552
-75
lines changed

4 files changed

+552
-75
lines changed

docs/how-to-guides/adding-or-reusing-tests.md

Lines changed: 201 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -14,103 +14,198 @@ Let's inspect the test setup in `sdk/python/tests/integration`:
1414

1515
```bash
1616
$ tree
17-
1817
.
1918
├── e2e
20-
│ └── test_universal_e2e.py
19+
│ ├── test_go_feature_server.py
20+
│ ├── test_python_feature_server.py
21+
│ ├── test_universal_e2e.py
22+
│ ├── test_usage_e2e.py
23+
│ └── test_validation.py
2124
├── feature_repos
25+
│ ├── integration_test_repo_config.py
2226
│ ├── repo_configuration.py
2327
│ └── universal
28+
│ ├── catalog
2429
│ ├── data_source_creator.py
2530
│ ├── data_sources
31+
│ │ ├── __init__.py
2632
│ │ ├── bigquery.py
2733
│ │ ├── file.py
28-
│ │ └── redshift.py
34+
│ │ ├── redshift.py
35+
│ │ └── snowflake.py
2936
│ ├── entities.py
30-
│ └── feature_views.py
37+
│ ├── feature_views.py
38+
│ ├── online_store
39+
│ │ ├── __init__.py
40+
│ │ ├── datastore.py
41+
│ │ ├── dynamodb.py
42+
│ │ ├── hbase.py
43+
│ │ └── redis.py
44+
│ └── online_store_creator.py
45+
├── materialization
46+
│ └── test_lambda.py
3147
├── offline_store
48+
│ ├── test_feature_logging.py
49+
│ ├── test_offline_write.py
50+
│ ├── test_push_features_to_offline_store.py
3251
│ ├── test_s3_custom_endpoint.py
3352
│ └── test_universal_historical_retrieval.py
3453
├── online_store
35-
│ ├── test_e2e_local.py
36-
│ ├── test_feature_service_read.py
3754
│ ├── test_online_retrieval.py
55+
│ ├── test_push_features_to_online_store.py
3856
│ └── test_universal_online.py
39-
├── registration
40-
│ ├── test_cli.py
41-
│ ├── test_cli_apply_duplicated_featureview_names.py
42-
│ ├── test_cli_chdir.py
43-
│ ├── test_feature_service_apply.py
44-
│ ├── test_feature_store.py
45-
│ ├── test_inference.py
46-
│ ├── test_registry.py
47-
│ ├── test_universal_odfv_feature_inference.py
48-
│ └── test_universal_types.py
49-
└── scaffolding
50-
├── test_init.py
51-
├── test_partial_apply.py
52-
├── test_repo_config.py
53-
└── test_repo_operations.py
54-
55-
8 directories, 27 files
56-
```
57+
└── registration
58+
├── test_feature_store.py
59+
├── test_inference.py
60+
├── test_registry.py
61+
├── test_sql_registry.py
62+
├── test_universal_cli.py
63+
├── test_universal_odfv_feature_inference.py
64+
└── test_universal_types.py
5765

58-
`feature_repos` has setup files for most tests in the test suite and pytest fixtures for other tests. These fixtures parametrize on different offline stores, online stores, etc. and thus abstract away store specific implementations so tests don't need to rewrite e.g. uploading dataframes to a specific store for setup.
66+
```
5967

60-
## Understanding an example test
68+
* `feature_repos` has setup files for most tests in the test suite.
69+
* `conftest.py` and some of the individual test files contain fixtures which can be used to on different offline stores, online stores, etc. and thus abstract away store specific implementations so we don't need to rewrite the same test implementation for different stores.
70+
71+
## Structure of the test suite
72+
73+
### What is the universal test suite?
74+
75+
The universal test suite verifies that crucial Feast functions (e.g `get_historical_features`, `get_online_features` etc.) have the correct behavior for each of the different environments that Feast could be used in. These environments are combinations of an offline store, online store, and provider and the universal test suite serves to run basic functional verification against all of these different permutations.
76+
77+
We use pytest [fixtures](https://docs.pytest.org/en/6.2.x/fixture.html) to accomplish this without writing excess code.
78+
79+
Tests in Feast are split into integration and unit tests.
80+
81+
### Is it an integration or unit test?
82+
83+
* Integration tests test non local Feast behavior. Integration tests mainly involve testing of Feast components that connect to services outside of Feast(e.g connecting to gcp or aws clients).
84+
* Generally if the test requires the initialization of a feature store in an external environment in order to test (i.e using our universal test fixtures), it is probably an integration test.
85+
* Unit tests, on the other hand, unit tests primarily test local and class level behavior that does not require spinning up an external service. If your test can be run locally without using any other services besides pytest, it is a unit test.
86+
87+
### Main types of tests
88+
89+
#### Integration tests
90+
91+
1. E2E tests
92+
* E2E tests test end-to-end functionality of Feast over the various codepaths (initialize a feature store, apply, and materialize).
93+
* The main codepaths include:
94+
* basic e2e tests for offline stores
95+
* `test_universal_e2e.py`
96+
* go feature server
97+
* `test_go_feature_server.py`
98+
* python http server
99+
* `test_python_feature_server.py`
100+
* usage tracking
101+
* `test_usage_e2e.py`
102+
* data quality monitoring feature validation
103+
* `test_validation.py`
104+
2. Offline and Online Store Tests
105+
* Offline and online store tests mainly test for the offline and online retrieval functionality.
106+
* The various specific functionalities that are tested include:
107+
* push API tests
108+
* `test_push_features_to_offline_store.py`
109+
* `test_push_features_to_online_store.py`
110+
* `test_offline_write.py`
111+
* historical retrieval tests
112+
* `test_universal_historical_retrieval.py`
113+
* online retrieval tests
114+
* `test_universal_online.py`
115+
* data quality monitoring feature logging tests
116+
* `test_feature_logging.py`
117+
* online store tests
118+
* `test_universal_online.py`
119+
3. Registration Tests
120+
* The registration folder contains all of the registry tests and some universal cli tests. This includes:
121+
* CLI Apply and Materialize tests tested against on the universal test suite
122+
* Data type inference tests
123+
* Registry tests
124+
4. Miscellaneous Tests
125+
* AWS Lambda Materialization Tests (Currently do not work)
126+
* `test_lambda.py`
127+
128+
#### Unit tests
129+
130+
1. Registry Diff Tests
131+
* These are tests for the infrastructure and registry diff functionality that Feast uses to determine if changes to the registry or infrastructure is needed.
132+
2. Local CLI Tests and Local Feast Tests
133+
* These tests test all of the cli commands against the local file offline store.
134+
3. Infrastructure Unit Tests
135+
* DynamoDB tests with dynamo mocked out
136+
* Repository configuration tests
137+
* Schema inference unit tests
138+
* Key serialization tests
139+
* Basic provider unit tests
140+
4. Feature Store Validation Tests
141+
* These test mainly contain class level validation like hashing tests, protobuf and class serialization, and error and warning handling.
142+
* Data source unit tests
143+
* Feature service unit tests
144+
* Feature service, feature view, and feature validation tests
145+
* Protobuf/json tests for Feast ValueTypes
146+
* Serialization tests
147+
* Type mapping
148+
* Feast types
149+
* Serialization tests due to this [issue](https://github.com/feast-dev/feast/issues/2345)
150+
* Feast usage tracking unit tests
151+
152+
#### Docstring tests
153+
154+
Docstring tests are primarily smoke tests to make sure imports and setup functions can be executed without errors.
155+
156+
## Understanding the test suite with an example test
157+
158+
### Example test
61159

62160
Let's look at a sample test using the universal repo:
63161

64162
{% tabs %}
65-
{% tab title="Python" %}
163+
{% tab code="sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py" %}
66164
```python
67165
@pytest.mark.integration
68-
@pytest.mark.parametrize("full_feature_names", [True, False], ids=lambda v: str(v))
166+
@pytest.mark.universal_offline_stores
167+
@pytest.mark.parametrize("full_feature_names", [True, False], ids=lambda v: f"full:{v}")
69168
def test_historical_features(environment, universal_data_sources, full_feature_names):
70169
store = environment.feature_store
71170

72171
(entities, datasets, data_sources) = universal_data_sources
73-
feature_views = construct_universal_feature_views(data_sources)
74172

75-
customer_df, driver_df, orders_df, global_df, entity_df = (
76-
datasets["customer"],
77-
datasets["driver"],
78-
datasets["orders"],
79-
datasets["global"],
80-
datasets["entity"],
81-
)
82-
# ... more test code
173+
feature_views = construct_universal_feature_views(data_sources)
83174

84-
customer_fv, driver_fv, driver_odfv, order_fv, global_fv = (
85-
feature_views["customer"],
86-
feature_views["driver"],
87-
feature_views["driver_odfv"],
88-
feature_views["order"],
89-
feature_views["global"],
90-
)
175+
entity_df_with_request_data = datasets.entity_df.copy(deep=True)
176+
entity_df_with_request_data["val_to_add"] = [
177+
i for i in range(len(entity_df_with_request_data))
178+
]
179+
entity_df_with_request_data["driver_age"] = [
180+
i + 100 for i in range(len(entity_df_with_request_data))
181+
]
91182

92183
feature_service = FeatureService(
93-
"convrate_plus100",
184+
name="convrate_plus100",
185+
features=[feature_views.driver[["conv_rate"]], feature_views.driver_odfv],
186+
)
187+
feature_service_entity_mapping = FeatureService(
188+
name="entity_mapping",
94189
features=[
95-
feature_views["driver"][["conv_rate"]],
96-
feature_views["driver_odfv"]
190+
feature_views.location.with_name("origin").with_join_key_map(
191+
{"location_id": "origin_id"}
192+
),
193+
feature_views.location.with_name("destination").with_join_key_map(
194+
{"location_id": "destination_id"}
195+
),
97196
],
98197
)
99198

100-
feast_objects = []
101-
feast_objects.extend(
199+
store.apply(
102200
[
103-
customer_fv,
104-
driver_fv,
105-
driver_odfv,
106-
order_fv,
107-
global_fv,
108201
driver(),
109202
customer(),
203+
location(),
110204
feature_service,
205+
feature_service_entity_mapping,
206+
*feature_views.values(),
111207
]
112208
)
113-
store.apply(feast_objects)
114209
# ... more test code
115210

116211
job_from_df = store.get_historical_features(
@@ -122,56 +217,93 @@ def test_historical_features(environment, universal_data_sources, full_feature_n
122217
"customer_profile:avg_passenger_count",
123218
"customer_profile:lifetime_trip_count",
124219
"conv_rate_plus_100:conv_rate_plus_100",
220+
"conv_rate_plus_100:conv_rate_plus_100_rounded",
125221
"conv_rate_plus_100:conv_rate_plus_val_to_add",
126222
"order:order_is_success",
127223
"global_stats:num_rides",
128224
"global_stats:avg_ride_length",
225+
"field_mapping:feature_name",
129226
],
130227
full_feature_names=full_feature_names,
131228
)
229+
230+
if job_from_df.supports_remote_storage_export():
231+
files = job_from_df.to_remote_storage()
232+
print(files)
233+
assert len(files) > 0 # This test should be way more detailed
234+
235+
start_time = datetime.utcnow()
132236
actual_df_from_df_entities = job_from_df.to_df()
133237
# ... more test code
134238

135-
assert_frame_equal(
136-
expected_df, actual_df_from_df_entities, check_dtype=False,
239+
validate_dataframes(
240+
expected_df,
241+
table_from_df_entities,
242+
keys=[event_timestamp, "order_id", "driver_id", "customer_id"],
137243
)
138244
# ... more test code
139245
```
140246
{% endtab %}
141247
{% endtabs %}
142248

143-
The key fixtures are the `environment` and `universal_data_sources` fixtures, which are defined in the `feature_repos` directories. This by default pulls in a standard dataset with driver and customer entities, certain feature views, and feature values. By including the environment as a parameter, the test automatically parametrizes across other offline / online store combinations.
249+
* The key fixtures are the `environment` and `universal_data_sources` fixtures, which are defined in the `feature_repos` directories and the `conftest.py` file. This by default pulls in a standard dataset with driver and customer entities (that we have pre-defined), certain feature views, and feature values.
250+
* The `environment` fixture sets up a feature store, parametrized by the provider and the online/offline store. It allows the test to query against that feature store without needing to worry about the underlying implementation or any setup that may be involved in creating instances of these datastores.
251+
* Each fixture creates a different integration test with its own `IntegrationTestRepoConfig` which is used by pytest to generate a unique test testing one of the different environments that require testing.
252+
253+
* Feast tests also use a variety of markers:
254+
* The `@pytest.mark.integration` marker is used to designate integration tests which will cause the test to be run when you call `make test-python-integration`.
255+
* The `@pytest.mark.universal_offline_stores` marker will parametrize the test on all of the universal offline stores including file, redshift, bigquery and snowflake.
256+
* The `full_feature_names` parametrization defines whether or not the test should reference features as their full feature name (fully qualified path) or just the feature name itself.
257+
144258

145259
## Writing a new test or reusing existing tests
146260

147261
### To add a new test to an existing test file
148262

149-
* Use the same function signatures as an existing test (e.g. use `environment` as an argument) to include the relevant test fixtures.
150-
* If possible, expand an individual test instead of writing a new test, due to the cost of standing up offline / online stores.
263+
* Use the same function signatures as an existing test (e.g. use `environment` and `universal_data_sources` as an argument) to include the relevant test fixtures.
264+
* If possible, expand an individual test instead of writing a new test, due to the cost of starting up offline / online stores.
265+
* Use the `universal_offline_stores` and `universal_online_store` markers to parametrize the test against different offline store and online store combinations. You can also designate specific online and offline stores to test by using the `only` parameter on the marker.
151266

267+
```python
268+
@pytest.mark.universal_online_stores(only=["redis"])
269+
```
152270
### To test a new offline / online store from a plugin repo
153271

154272
* Install Feast in editable mode with `pip install -e`.
155273
* The core tests for offline / online store behavior are parametrized by the `FULL_REPO_CONFIGS` variable defined in `feature_repos/repo_configuration.py`. To overwrite this variable without modifying the Feast repo, create your own file that contains a `FULL_REPO_CONFIGS` (which will require adding a new `IntegrationTestRepoConfig` or two) and set the environment variable `FULL_REPO_CONFIGS_MODULE` to point to that file. Then the core offline / online store tests can be run with `make test-python-universal`.
156274
* See the [custom offline store demo](https://github.com/feast-dev/feast-custom-offline-store-demo) and the [custom online store demo](https://github.com/feast-dev/feast-custom-online-store-demo) for examples.
157275

276+
### What are some important things to keep in mind when adding a new offline / online store?
277+
278+
#### Type mapping/Inference
279+
280+
Many problems arise when implementing your data store's type conversion to interface with Feast datatypes.
281+
1. You will need to correctly update `inference.py` so that Feast can infer your datasource schemas
282+
2. You also need to update `type_map.py` so that Feast knows how to convert your datastores types to Feast-recognized types in `feast/types.py`.
283+
284+
#### Historical and online retrieval
285+
286+
The most important functionality in Feast is historical and online retrieval. Most of the e2e and universal integration test test this functionality in some way. Making sure this functionality works also indirectly asserts that reading and writing from your datastore works as intended.
287+
288+
158289
### To include a new offline / online store in the main Feast repo
159290

160291
* Extend `data_source_creator.py` for your offline store.
161-
* In `repo_configuration.py` add a new`IntegrationTestRepoConfig` or two (depending on how many online stores you want to test).
292+
* In `repo_configuration.py` add a new `IntegrationTestRepoConfig` or two (depending on how many online stores you want to test).
293+
* Generally, you should only need to test against sqlite. However, if you need to test against a production online store, then you can also test against Redis or dynamodb.
162294
* Run the full test suite with `make test-python-integration.`
163295

164296
### Including a new offline / online store in the main Feast repo from external plugins with community maintainers.
165297

166-
* This folder is for plugins that are officially maintained with community owners. Place the APIs in feast/infra/offline_stores/contrib/.
298+
* This folder is for plugins that are officially maintained with community owners. Place the APIs in `feast/infra/offline_stores/contrib/`.
167299
* Extend `data_source_creator.py` for your offline store and implement the required APIs.
168300
* In `contrib_repo_configuration.py` add a new `IntegrationTestRepoConfig` (depending on how many online stores you want to test).
169301
* Run the test suite on the contrib test suite with `make test-python-contrib-universal`.
170302

171303
### To include a new online store
172304

173305
* In `repo_configuration.py` add a new config that maps to a serialized version of configuration you need in `feature_store.yaml` to setup the online store.
174-
* In `repo_configuration.py`, add new`IntegrationTestRepoConfig` for offline stores you want to test.
306+
* In `repo_configuration.py`, add new `IntegrationTestRepoConfig` for online stores you want to test.
175307
* Run the full test suite with `make test-python-integration`
176308

177309
### To use custom data in a new test
@@ -193,11 +325,11 @@ def your_test(environment: Environment):
193325
# ... run test
194326
```
195327

196-
### Running your own redis cluster for testing
328+
### Running your own Redis cluster for testing
197329

198-
* Install redis on your computer. If you are a mac user, you should be able to `brew install redis`.
330+
* Install Redis on your computer. If you are a mac user, you should be able to `brew install redis`.
199331
* Running `redis-server --help` and `redis-cli --help` should show corresponding help menus.
200-
* Run `cd scripts/create-cluster` and run `./create-cluster start` then `./create-cluster create` to start the server. You should see output that looks like this:
332+
* * Run `./infra/scripts/redis-cluster.sh start` then `./infra/scripts/redis-cluster.sh create` to start the Redis cluster locally. You should see output that looks like this:
201333
~~~~
202334
Starting 6001
203335
Starting 6002
@@ -206,6 +338,6 @@ Starting 6004
206338
Starting 6005
207339
Starting 6006
208340
~~~~
209-
* You should be able to run the integration tests and have the redis cluster tests pass.
210-
* If you would like to run your own redis cluster, you can run the above commands with your own specified ports and connect to the newly configured cluster.
211-
* To stop the cluster, run `./create-cluster stop` and then `./create-cluster clean`.
341+
* You should be able to run the integration tests and have the Redis cluster tests pass.
342+
* If you would like to run your own Redis cluster, you can run the above commands with your own specified ports and connect to the newly configured cluster.
343+
* To stop the cluster, run `./infra/scripts/redis-cluster.sh stop` and then `./infra/scripts/redis-cluster.sh clean`.

0 commit comments

Comments
 (0)