Merge pull request #136 from dbt-labs/feature/allow-exceptions-to-rules

b-per · web-flow · commit cbc788aff526 · 2022-08-13T03:20:55.000+10:00
✨ Allow users to exclude given models/rows
diff --git a/README.md b/README.md
@@ -45,46 +45,44 @@ Once you've installed the package, all you have to do is run a `dbt build --sele
 ----
 ## Package Documentation
 
-__[DAG Issues](#dag-issues)__
-- [Direct Join to Source](#direct-join-to-source)
-- [Downstream Models Dependent on Source](#downstream-models-dependent-on-source)
-- [Model Fanout](#model-fanout)
-- [Multiple Sources Joined](#multiple-sources-joined)
-- [Rejoining of Upstream Concepts](#rejoining-of-upstream-concepts)
-- [Root Models](#root-models)
-- [Source Fanout](#source-fanout)
-- [Staging Models Dependent on Downstream Models](#staging-models-dependent-on-downstream-models)
-- [Staging Models Dependent on Other Staging Models](#staging-models-dependent-on-other-staging-models)
-- [Unused Sources](#unused-sources)
-
-__[Testing](#testing)__
-- [Models without Primary Key Tests](#models-without-primary-key-tests)
-- [Test Coverage](#test-coverage)
-
-__[Documentation](#documentation)__
-- [Documentation Coverage](#documentation-coverage)
-- [Undocumented Models](#undocumented-models)
-
-__[Structure](#structure)__
-- [Model Naming Conventions](#model-naming-conventions)
-- [Model Directories](#model-directories)
-- [Source Directories](#model-directories)
-- [Test Directories](#test-directories)
-
-__[Performance](#performance)__
-- [Chained View Dependencies](#chained-view-dependencies)
-- [Exposure Parents Materializations](#exposure-parents-materializations)
-
-__[Customization](#customization)__
+### Rules
+- __[DAG Issues](#dag-issues)__
+  - [Direct Join to Source](#direct-join-to-source)
+  - [Downstream Models Dependent on Source](#downstream-models-dependent-on-source)
+  - [Model Fanout](#model-fanout)
+  - [Multiple Sources Joined](#multiple-sources-joined)
+  - [Rejoining of Upstream Concepts](#rejoining-of-upstream-concepts)
+  - [Root Models](#root-models)
+  - [Source Fanout](#source-fanout)
+  - [Staging Models Dependent on Downstream Models](#staging-models-dependent-on-downstream-models)
+  - [Staging Models Dependent on Other Staging Models](#staging-models-dependent-on-other-staging-models)
+  - [Unused Sources](#unused-sources)
+- __[Testing](#testing)__
+  - [Models without Primary Key Tests](#models-without-primary-key-tests)
+  - [Test Coverage](#test-coverage)
+- __[Documentation](#documentation)__
+  - [Documentation Coverage](#documentation-coverage)
+  - [Undocumented Models](#undocumented-models)
+- __[Structure](#structure)__
+  - [Model Naming Conventions](#model-naming-conventions)
+  - [Model Directories](#model-directories)
+  - [Source Directories](#model-directories)
+  - [Test Directories](#test-directories)
+- __[Performance](#performance)__
+  - [Chained View Dependencies](#chained-view-dependencies)
+  - [Exposure Parents Materializations](#exposure-parents-materializations)
+
+### [Customization](#customization)
 - [Disabling Models](#disabling-models)
 - [Overriding Variables](#overriding-variables)
+- [Configuring exceptions to the rules](#configuring-exceptions-to-the-rules)
 
-__[Querying the DAG with SQL](#querying-the-dag-with-sql)__
+### [Querying the DAG with SQL](#querying-the-dag-with-sql)
 
-__[Limitations](#limitations)__
+### [Limitations](#limitations)
 - [BigQuery and Databricks](#bigquery-and-databricks)
 
-__[Contributing](#contributing)__
+### [Contributing](#contributing)
 
 ----
 
@@ -859,6 +857,54 @@ vars:
 
 Changing `max_depth_dag` number to a higher one might prevent the package from running properly on BigQuery and Databricks/Spark.
 
+
+### Configuring exceptions to the rules
+
+While the rules defined in this package are considered best practices, we realize that there might be exceptions to those rules and people might want to exclude given results to get passing tests despite not following all the recommendations.
+
+An example would be excluding all models with names matching with `stg_..._unioned` from `fct_multiple_sources_joined` as we might want to union 2 different tables representing the same data in some of our staging models and we don't want the test to fail for those models.
+
+The package offers the ability to define a seed called `dbt_project_evaluator_exceptions.csv` to list those exceptions we don't want to be reported. This seed must contain the following columns:
+- `fct_name`: the name of the fact table for which we want to define exceptions (Please note that it is not possible to exclude specific models for all the `coverage` tests, but there are variables available to configure those to the particular users' needs)
+- `column_name`: the column name from `fct_name` we will be looking at to define exceptions
+- `id_to_exclude`: the values (or `like` pattern) we want to exclude for `column_name`
+- `comment`: a field where people can document why a given exception is legitimate
+
+The following section describes the steps to follow to configure exceptions.
+
+#### 1. Create a new seed
+
+With our previous example, the seed `dbt_project_evaluator_exceptions.csv` would look like:
+```
+fct_name,column_name,id_to_exclude,comment
+fct_multiple_sources_joined,child,stg_%_unioned,Models called _unioned can union multiple sources
+```
+
+which looks like the following when loaded in the warehouse
+
+|fct_name                   |column_name|id_to_exclude   |comment                                           |
+|---------------------------|-----------|----------------|--------------------------------------------------|
+|fct_multiple_sources_joined|child      |stg\_%\_unioned |Models called \_unioned can union multiple sources|
+
+
+#### 2. Deactivate the seed from the original package
+
+Only a single seed can exist with a given name. When using a custom one, we need to deactivate the one from the package by adding the following to our `dbt_project.yml`
+```
+seeds:
+  dbt_project_evaluator:
+    dbt_project_evaluator_exceptions:
+      +enabled: false
+```
+
+#### 3. Run the seed and the package
+
+We then run both the seed and the package by executing the following command:
+```
+dbt build --select package:dbt_project_evaluator dbt_project_evaluator_exceptions
+```
+
+
 ----
 
 ## Querying the DAG with SQL
diff --git a/integration_tests/dbt_project.yml b/integration_tests/dbt_project.yml
@@ -40,4 +40,9 @@ tests:
     dbt_project_evaluator_schema_tests:
       unique_int_all_dag_relationships_path:
         # Grouping by expressions of type ARRAY is not allowed for BigQuery
-        +enabled: "{{ false if target.type in ['bigquery'] else true }}"
+        +enabled: "{{ false if target.type in ['bigquery'] else true }}"
+
+seeds:
+  dbt_project_evaluator:
+    dbt_project_evaluator_exceptions:
+      +enabled: false
diff --git a/integration_tests/seeds/dag/test_fct_direct_join_to_source.csv b/integration_tests/seeds/dag/test_fct_direct_join_to_source.csv
@@ -1,3 +1,2 @@
 parent,parent_resource_type,child,child_resource_type,distance
-source_1.table_2,source,int_model_4,model,1
-stg_model_1,model,int_model_4,model,1
+source_1.table_2,source,int_model_4,model,1
diff --git a/integration_tests/seeds/dbt_project_evaluator_exceptions.csv b/integration_tests/seeds/dbt_project_evaluator_exceptions.csv
@@ -0,0 +1,2 @@
+fct_name,column_name,id_to_exclude,comment
+fct_direct_join_to_source,parent_id,model.dbt_project_evaluator_integration_tests.stg_model_1,This is actually OK because...
diff --git a/macros/filter_exceptions.sql b/macros/filter_exceptions.sql
@@ -0,0 +1,18 @@
+{% macro filter_exceptions(model_ref) %}
+
+{% set query_filters %}
+select 
+    column_name, 
+    id_to_exclude 
+from {{ ref('dbt_project_evaluator_exceptions') }}
+where fct_name = '{{ model_ref.name }}'
+{% endset %}
+
+{% if execute %}
+    where 1 = 1
+    {% for row_filter in run_query(query_filters) %}
+        and {{ row_filter[0] }} not like '{{ row_filter[1] }}'
+    {% endfor %}
+{% endif %}
+  
+{% endmacro %}
diff --git a/models/marts/dag/fct_direct_join_to_source.sql b/models/marts/dag/fct_direct_join_to_source.sql
@@ -33,4 +33,6 @@ final as (
     order by direct_model_relationships.child
 )
 
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_marts_or_intermediate_dependent_on_source.sql b/models/marts/dag/fct_marts_or_intermediate_dependent_on_source.sql
@@ -15,4 +15,6 @@ final as (
     where parent_resource_type = 'source'
     and child_model_type in ('marts', 'intermediate')
 )
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_model_fanout.sql b/models/marts/dag/fct_model_fanout.sql
@@ -29,3 +29,5 @@ model_fanout as (
 )
 
 select * from model_fanout
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_multiple_sources_joined.sql b/models/marts/dag/fct_multiple_sources_joined.sql
@@ -16,4 +16,6 @@ multiple_sources_joined as (
     having count(*) > 1
 )
 
-select * from multiple_sources_joined
+select * from multiple_sources_joined
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_rejoining_of_upstream_concepts.sql b/models/marts/dag/fct_rejoining_of_upstream_concepts.sql
@@ -55,7 +55,13 @@ final as (
     from triad_relationships
     left join single_use_resources 
         on triad_relationships.parent_and_child = single_use_resources.parent
+),
+
+final_filtered as (
+    select * from final
+    where is_loop_independent
 )
 
-select * from final
-where is_loop_independent
+select * from final_filtered
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_root_models.sql b/models/marts/dag/fct_root_models.sql
@@ -15,4 +15,6 @@ final as (
     having max(distance) = 0
 )
 
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_source_fanout.sql b/models/marts/dag/fct_source_fanout.sql
@@ -17,4 +17,6 @@ source_fanout as (
     having count(*) > 1
 )
 
-select * from source_fanout
+select * from source_fanout
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_staging_dependent_on_marts_or_intermediate.sql b/models/marts/dag/fct_staging_dependent_on_marts_or_intermediate.sql
@@ -18,4 +18,6 @@ final as (
     where child_model_type = 'staging'
     and parent_model_type in ('marts', 'intermediate')
 )
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_staging_dependent_on_staging.sql b/models/marts/dag/fct_staging_dependent_on_staging.sql
@@ -19,4 +19,6 @@ bending_connections as (
     and child_model_type = 'staging'
 )
 
-select * from bending_connections
+select * from bending_connections
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/dag/fct_unused_sources.sql b/models/marts/dag/fct_unused_sources.sql
@@ -15,4 +15,6 @@ final as (
     having max(distance) = 0
 )
 
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/documentation/fct_undocumented_models.sql b/models/marts/documentation/fct_undocumented_models.sql
@@ -17,3 +17,5 @@ final as (
 )
 
 select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/performance/fct_chained_views_dependencies.sql b/models/marts/performance/fct_chained_views_dependencies.sql
@@ -17,4 +17,7 @@ final as (
 )
 
 select * from final
+
+{{ filter_exceptions(this) }}
+
 order by distance desc
diff --git a/models/marts/performance/fct_exposure_parents_materializations.sql b/models/marts/performance/fct_exposure_parents_materializations.sql
@@ -20,4 +20,6 @@ final as (
 
 )
 
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/structure/fct_model_directories.sql b/models/marts/structure/fct_model_directories.sql
@@ -62,4 +62,6 @@ unioned as (
     select * from innappropriate_subdirectories_non_staging_models
 )
 
-select * from unioned
+select * from unioned
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/structure/fct_model_naming_conventions.sql b/models/marts/structure/fct_model_naming_conventions.sql
@@ -43,4 +43,6 @@ inappropriate_model_names as (
 
 )
 
-select * from inappropriate_model_names
+select * from inappropriate_model_names
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/structure/fct_source_directories.sql b/models/marts/structure/fct_source_directories.sql
@@ -18,4 +18,6 @@ inappropriate_subdirectories_sources as (
     and directory_path not like '%' || source_name || '%'
 )
 
-select * from inappropriate_subdirectories_sources
+select * from inappropriate_subdirectories_sources
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/structure/fct_test_directories.sql b/models/marts/structure/fct_test_directories.sql
@@ -80,3 +80,5 @@ different_directories as (
 )
 
 select * from different_directories
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/tests/fct_missing_primary_key_tests.sql b/models/marts/tests/fct_missing_primary_key_tests.sql
@@ -13,4 +13,6 @@ final as (
 
 )
 
-select * from final
+select * from final
+
+{{ filter_exceptions(this) }}
diff --git a/models/marts/tests/fct_test_coverage.sql b/models/marts/tests/fct_test_coverage.sql
@@ -33,4 +33,4 @@ final as (
     on test_counts.resource_name = conversion.resource_name
 )
 
-select * from final
+select * from final
diff --git a/seeds/dbt_project_evaluator_exceptions.csv b/seeds/dbt_project_evaluator_exceptions.csv
@@ -0,0 +1 @@
+fct_name,column_name,id_to_exclude,comment

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+fct_name,column_name,id_to_exclude,comment`
	`2`	`+fct_direct_join_to_source,parent_id,model.dbt_project_evaluator_integration_tests.stg_model_1,This is actually OK because...`
Original file line number	Diff line number	Diff line change
`@@ -33,4 +33,6 @@ final as (`
`33`	`33`	`order by direct_model_relationships.child`
`34`	`34`	`)`
`35`	`35`
`36`		`-select * from final`
	`36`	`+select * from final`
	`37`	`+`
	`38`	`+{{ filter_exceptions(this) }}`
Original file line number	Diff line number	Diff line change
`@@ -15,4 +15,6 @@ final as (`
`15`	`15`	`where parent_resource_type = 'source'`
`16`	`16`	`and child_model_type in ('marts', 'intermediate')`
`17`	`17`	`)`
`18`		`-select * from final`
	`18`	`+select * from final`
	`19`	`+`
	`20`	`+{{ filter_exceptions(this) }}`
Original file line number	Diff line number	Diff line change
`@@ -29,3 +29,5 @@ model_fanout as (`
`29`	`29`	`)`
`30`	`30`
`31`	`31`	`select * from model_fanout`
	`32`	`+`
	`33`	`+{{ filter_exceptions(this) }}`
Original file line number	Diff line number	Diff line change
`@@ -16,4 +16,6 @@ multiple_sources_joined as (`
`16`	`16`	`having count(*) > 1`
`17`	`17`	`)`
`18`	`18`
`19`		`-select * from multiple_sources_joined`
	`19`	`+select * from multiple_sources_joined`
	`20`	`+`
	`21`	`+{{ filter_exceptions(this) }}`