|
| 1 | +# Configuration as Data |
| 2 | + |
| 3 | +* Author(s): Martin Maly, @martinmaly |
| 4 | +* Approver: @bgrant0607 |
| 5 | + |
| 6 | +## Why |
| 7 | + |
| 8 | +This document provides bacgrdound context for Package Orchestration, which is |
| 9 | +further elaborated in a dedicated [document](07-package-orchestration.md). |
| 10 | + |
| 11 | +## Configuration as Data |
| 12 | + |
| 13 | +*Configuration as Data* is an approach to management of configuration (incl. |
| 14 | +configuration of infrastructure, policy, services, applications, etc.) which: |
| 15 | + |
| 16 | +* makes configuration data the source of truth, stored separately from the live |
| 17 | + state |
| 18 | +* uses a uniform, serializable data model to represent configuration |
| 19 | +* separates code that acts on the configuration from the data and from packages |
| 20 | + / bundles of the data |
| 21 | +* abstracts configuration file structure and storage from operations that act |
| 22 | + upon the configuration data; clients manipulating configuration data don’t |
| 23 | + need to directly interact with storage (git, container images) |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +## Key Principles |
| 28 | + |
| 29 | +A system based on CaD *should* observe the following key principles: |
| 30 | + |
| 31 | +* secrets should be stored separately, in a secret-focused storage system |
| 32 | + ([example](https://cloud.google.com/secret-manager)) |
| 33 | +* stores a versioned history of configuration changes by change sets to bundles |
| 34 | + of related configuration data |
| 35 | +* relies on uniformity and consistency of the configuration format, including |
| 36 | + type metadata, to enable pattern-based operations on the configuration data, |
| 37 | + along the lines of duck typing |
| 38 | +* separates schemas for the configuration data from the data, and relies on |
| 39 | + schema information for strongly typed operations and to disambiguate data |
| 40 | + structures and other variations within the model |
| 41 | +* decouples abstractions of configuration from collections of configuration data |
| 42 | +* represents abstractions of configuration generators as data with schemas, like |
| 43 | + other configuration data |
| 44 | +* finds, filters / queries / selects, and/or validates configuration data that |
| 45 | + can be operated on by given code (functions) |
| 46 | +* finds and/or filters / queries / selects code (functions) that can operate on |
| 47 | + resource types contained within a body of configuration data |
| 48 | +* *actuation* (reconciliation of configuration data with live state) is separate |
| 49 | + from transformation of configuration data, and is driven by the declarative |
| 50 | + data model |
| 51 | +* transformations, particularly value propagation, are preferable to wholesale |
| 52 | + configuration generation except when the expansion is dramatic (say, >10x) |
| 53 | +* transformation input generation should usually be decoupled from propagation |
| 54 | +* deployment context inputs should be taken from well defined “provider context” |
| 55 | + objects |
| 56 | +* identifiers and references should be declarative |
| 57 | +* live state should be linked back to sources of truth (configuration) |
| 58 | + |
| 59 | +## KRM CaD |
| 60 | + |
| 61 | +Our implementation of the Configuration as Data approach ( |
| 62 | +[kpt](https://kpt.dev), |
| 63 | +[Config Sync](https://cloud.google.com/anthos-config-management/docs/config-sync-overview), |
| 64 | +and [Package Orchestration](https://github.com/GoogleContainerTools/kpt/tree/main/porch)) |
| 65 | +build on the fuondation of |
| 66 | +[Kubernetes Resource Model](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md) |
| 67 | +(KRM). |
| 68 | + |
| 69 | +**Note**: Even though KRM is not a requirement of Config as Data (just like |
| 70 | +Python or Go templates or Jinja are not specifically requirements for |
| 71 | +[IaC](https://en.wikipedia.org/wiki/Infrastructure_as_code)), the choice of |
| 72 | +another foundational config reporesentation format would necessitate |
| 73 | +implementing adapters for all types of infrastructure and applications |
| 74 | +configured, including Kubernetes, CRDs, GCP resources and more. Likewise, choice |
| 75 | +of another configuration format would require redesign of a number of the |
| 76 | +configuration management mechanisms that have already been designed for KRM, |
| 77 | +such as 3-way merge, structural merge patch, schema descriptions, resource |
| 78 | +metadata, references, status conventions, etc. |
| 79 | + |
| 80 | +**KRM CaD** is therefore a specific approach to implementing *Configuration as |
| 81 | +Data* which: |
| 82 | +* uses [KRM](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md) |
| 83 | + as the configuration serialization data model |
| 84 | +* uses [Kptfile](https://kpt.dev/reference/schema/kptfile/) to store package |
| 85 | + metadata |
| 86 | +* uses [ResourceList](https://kpt.dev/reference/schema/resource-list/) as a |
| 87 | + serialized packge wire-format |
| 88 | +* uses a function `ResourceList → ResultList` (`kpt` function) as the |
| 89 | + foundational, composable unit of package-manipulation code (note that other |
| 90 | + forms of code can manipulate packages as well, i.e. UIs, custom algorithms |
| 91 | + not necessarily packaged and used as kpt functions) |
| 92 | + |
| 93 | +and provides the following basic functionality: |
| 94 | + |
| 95 | +* load a serialized package from a repository (as `ResourceList`) (examples of |
| 96 | + repository may be one or more of: local HDD, Git repository, OCI, Cloud |
| 97 | + Storage, etc.) |
| 98 | +* save a serialized package (as `ResourceList`) to a package repository |
| 99 | +* evaluate a function on a serialized package (`ResourceList`) |
| 100 | +* [render](https://kpt.dev/book/04-using-functions/01-declarative-function-execution) |
| 101 | + a package (evaluate functions declared within the package itself) |
| 102 | +* create a new (empty) package |
| 103 | +* fork (or clone) an existing package from one package repository (called |
| 104 | + upstream) to another (called downstream) |
| 105 | +* delete a package from a repository |
| 106 | +* associate a version with the package; guarantee immutability of packages with |
| 107 | + an assigned version |
| 108 | +* incorporate changes from the new version of an upstream package into a new |
| 109 | + version of a downstream package |
| 110 | +* revert to a prior version of a package |
| 111 | + |
| 112 | +## Value |
| 113 | + |
| 114 | +The Config as Data approach enables some key value which is available in other |
| 115 | +configuration management approaches to a lesser extent or is not available |
| 116 | +at all. |
| 117 | + |
| 118 | +*CaD* approach enables: |
| 119 | + |
| 120 | +* simplified authoring of configuration using a variety of methods and sources |
| 121 | +* WYSIWYG interaction with configuration using a simple data serialization |
| 122 | + formation rather than a code-like format |
| 123 | +* layering of interoperable interface surfaces (notably GUI) over declarative |
| 124 | + configuration mechanisms rather than forcing choices between exclusive |
| 125 | + alternatives (exclusively UI/CLI or IaC initially followed by exclusively |
| 126 | + UI/CLI or exclusively IaC) |
| 127 | +* the ability to apply UX techniques to simplify configuration authoring and |
| 128 | + viewing |
| 129 | +* compared to imperative tools (e.g., UI, CLI) that directly modify the live |
| 130 | + state via APIs, CaD enables versioning, undo, audits of configuration history, |
| 131 | + review/approval, pre-deployment preview, validation, safety checks, |
| 132 | + constraint-based policy enforcement, and disaster recovery |
| 133 | +* bulk changes to configuration data in their sources of truth |
| 134 | +* injection of configuration to address horizontal concerns |
| 135 | +* merging of multiple sources of truth |
| 136 | +* state export to reusable blueprints without manual templatization |
| 137 | +* cooperative editing of configuration by humans and automation, such as for |
| 138 | + security remediation (which is usually implemented against live-state APIs) |
| 139 | +* reusability of configuration transformation code across multiple bodies of |
| 140 | + configuration data containing the same resource types, amortizing the effort |
| 141 | + of writing, testing, documenting the code |
| 142 | +* combination of independent configuration transformations |
| 143 | +* implementation of config transformations using the languages of choice, |
| 144 | + including both programming and scripting approaches |
| 145 | +* reducing the frequency of changes to existing transformation code |
| 146 | +* separation of roles between developer and non-developer configuration users |
| 147 | +* defragmenting the configuration transformation ecosystem |
| 148 | +* admission control and invariant enforcement on sources of truth |
| 149 | +* maintaining variants of configuration blueprints without one-size-fits-all |
| 150 | + full struct-constructor-style parameterization and without manually |
| 151 | + constructing and maintaining patches |
| 152 | +* drift detection and remediation for most of the desired state via continuous |
| 153 | + reconciliation using apply and/or for specific attributes via targeted |
| 154 | + mutation of the sources of truth |
| 155 | + |
| 156 | +## Related Articles |
| 157 | + |
| 158 | +For more information about Configuration as Data and Kubernetes Resource Model, |
| 159 | +visit the following links: |
| 160 | + |
| 161 | +* [Rationale for kpt](https://kpt.dev/guides/rationale) |
| 162 | +* [Understanding Configuration as Data](https://cloud.google.com/blog/products/containers-kubernetes/understanding-configuration-as-data-in-kubernetes) |
| 163 | + blog post. |
| 164 | +* [Kubernetes Resource Model](https://cloud.google.com/blog/topics/developers-practitioners/build-platform-krm-part-1-whats-platform) |
| 165 | + blog post series |
0 commit comments