This tool allows you to query metrics from your dbt semantic model directly from a dbt model through the dbt_metric_utils_materialize
macro. One way to look at it is that it revives the metric.calculate()
macro from dbt <=v1.5
. By having access to this macro, the dbt semantic layer becomes more useful for dbt-core users. You still don't have all the goodness of dbt-cloud semantic layer but it does allow you to get started with connecting your users and BI tools to aggregation tables/views that are directly querying your metrics.
Tip
Check out some examples queries here
Tip
Browse the dbt docs pages for the example project here
This project is a Python package that wraps around dbt
in the most transparant way I could find. Try it out through the following steps:
- Install
dbt-metric-utils
from Pypi in your project (e.g.pip install dbt-metric-utils
) - Run
dbt-metric-utils init
ordbtmu init
. This will install the macro into your project and will make sure that anydbt
CLI calls are intercepted and processed in the correct way (check below for explanation) - Introduce a dbt model that calls the
dbt_metric_utils_materialize
macro. - Continue using
dbt
as you're used to.
Any dbt command that doesn't require dbt to compile your project is simply passed directly to dbt (Mode A in the diagram). A dbt invocation that does require compilation (e.g. compile
, run
, test
, etc) is intercepted by the package.
After intercepting we run through the following sequence of steps
- Call
dbt parse
. This will build a partially filledmanifest.json
from which we can extract all the models, their dependencies, and the raw SQL queries. - Extract all models that contain a
dbt_metric_utils_materialize
invocation. - Run
mf query --explain
commands for all thedbt_metric_utils_materialize
invocations. - Inject the generated queries by Metricflow as dbt variables in the actual dbt command. If the user ran
dbt run
, we actually triggerdbt run --vars {<macro_invocation_signature>: <query>}
The passed variables will be a mapping from dbt_metric_utils_materialize
invocation signature (e.g. metric=['m1'],dimensions='[dim1']...
) to the generated metric query. The dbt_metric_utils_materialize
macro will find that variable at compile time and return it as the macro result.
Along this sequence of steps, we also ensure that the dependency graph in manifest.json
is updated correctly. Dbt itself only detects dependencies based on ref
and source
, not on macros that are external to it.
We describe here some details of how the code is used to achieve the steps described above.
You'll run dbtmu init
(or dbt-metric-utils init
) once, which wil:
- Add dbt_metric_utils_marterialize.sql to macros/, which are the macros used by dbt at compile time (like all regular macros)
- Replace the dbt script in bin/ with a dbt-metric-utils version, which is identical except for importing
from dbt_metric_utils.cli import cli
instead offrom dbt.main import main
. This will reroute all dbt calls to the dbt-metric-utils CLI, instead of dbt.
When you call any dbt ...
command, this calls the dbt-metric-utils CLI (as described above)
If the subcommand requires MetricFlow compilation (e.g. compile, run, test etc), then:
- We call
manifest, metric_query_as_vars = get_metric_queries_as_dbt_vars(target)
to generate an updated manifest (with the model -> metric dependencies) and the rendered SQL calculating the metrics, by:- We parse the dbt project to find all models which call
dbt_metric_utils_materialize
(which usesdbtRunner().invoke(["parse"])
) - For each
dbt_metric_utils_materialize(...)
invocation, we call the function throughexec
, which renders the code usingdbt_metricflow.cli.cli_context.CLIContext().explain(...).rendered_sql_without_descriptions.sql_query
- The rendered SQL is stored as values in a dictionary, where the key is the call syntax
- We update the manifest to include dependencies between these models and the metrics
- We parse the dbt project to find all models which call
- Then, dbt is run (using
dbtRunner(manifest=manifest
).invoke(.., "--vars", yaml.dump(vars_dict)), passing the dict of rendered SQL as a yaml dump to the--vars
argument - When the macro is reached in dbt compile, the macro looks for the rendered sql in the
--vars
argument, and writes this into the query
If the command doesn't require MetricFlow compilation, dbt is called using the Python integration (res = dbtRunner().invoke(_args)
)
There is of course a risk that something changes in a future dbt version which breaks dbt-metric-utils. Below is a list of assumptions/requirements for the integration, so that we can make a sensible decision on:
- Whether this should automatically work for future dbt versions
- If not, how complex it would be to maintain
- How likely it is to be impossible to make it compatible with future versions
The following dbt interfaces/features are used by dbt-metric-utils:
- Functionality like
dbt parse
to get the manifest (invoked throughdbtRunner().invoke(...)
) - Functionality like
mf query --explain
(invoked throughmf_query = dbt_metricflow.cli.cli_context.CLIContext().mf.explain(MetricFlowQueryRequest.create_with_random_request_id()).rendered_sql_without_descriptions.sql_query
- Being able to invoke dbt with something like
dbtRunner(manifest=manifest).invoke(["command", *args, "--vars", ...])
. Critically, we need to pass in an updated manifest and custom vars.
Since 1 and 2 could also be retrieved from a commandline call, even if the Python dbt interface changes, we'd still have a way to upgrade this repo If 3 changed (e.g. not being able to pass a manifest (possible) or vars (unlikely), we'd have more of a challenge, since there isn't a commandline option for passing the manifest in (as far as I can see)
An option here would be to have a precompilation step, where we physically rewrite the files with the rendered sql (potentially with refs to the metrics or model, to help keep the dependencies)
Overall, I feel it would be fairly easy to update this tool for future dbt versions.