Skip to content

Commit 2b2e097

Browse files
authored
Config as Data Overview (#3058)
The first of several documents describing the foundation and implementation of Package Orchestration.
1 parent 1272784 commit 2b2e097

File tree

2 files changed

+166
-0
lines changed

2 files changed

+166
-0
lines changed

docs/design-docs/06-config-as-data.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Configuration as Data
2+
3+
* Author(s): Martin Maly, @martinmaly
4+
* Approver: @bgrant0607
5+
6+
## Why
7+
8+
This document provides bacgrdound context for Package Orchestration, which is
9+
further elaborated in a dedicated [document](07-package-orchestration.md).
10+
11+
## Configuration as Data
12+
13+
*Configuration as Data* is an approach to management of configuration (incl.
14+
configuration of infrastructure, policy, services, applications, etc.) which:
15+
16+
* makes configuration data the source of truth, stored separately from the live
17+
state
18+
* uses a uniform, serializable data model to represent configuration
19+
* separates code that acts on the configuration from the data and from packages
20+
/ bundles of the data
21+
* abstracts configuration file structure and storage from operations that act
22+
upon the configuration data; clients manipulating configuration data don’t
23+
need to directly interact with storage (git, container images)
24+
25+
![CaD Overview](./CaD%20Overview.svg)
26+
27+
## Key Principles
28+
29+
A system based on CaD *should* observe the following key principles:
30+
31+
* secrets should be stored separately, in a secret-focused storage system
32+
([example](https://cloud.google.com/secret-manager))
33+
* stores a versioned history of configuration changes by change sets to bundles
34+
of related configuration data
35+
* relies on uniformity and consistency of the configuration format, including
36+
type metadata, to enable pattern-based operations on the configuration data,
37+
along the lines of duck typing
38+
* separates schemas for the configuration data from the data, and relies on
39+
schema information for strongly typed operations and to disambiguate data
40+
structures and other variations within the model
41+
* decouples abstractions of configuration from collections of configuration data
42+
* represents abstractions of configuration generators as data with schemas, like
43+
other configuration data
44+
* finds, filters / queries / selects, and/or validates configuration data that
45+
can be operated on by given code (functions)
46+
* finds and/or filters / queries / selects code (functions) that can operate on
47+
resource types contained within a body of configuration data
48+
* *actuation* (reconciliation of configuration data with live state) is separate
49+
from transformation of configuration data, and is driven by the declarative
50+
data model
51+
* transformations, particularly value propagation, are preferable to wholesale
52+
configuration generation except when the expansion is dramatic (say, >10x)
53+
* transformation input generation should usually be decoupled from propagation
54+
* deployment context inputs should be taken from well defined “provider context”
55+
objects
56+
* identifiers and references should be declarative
57+
* live state should be linked back to sources of truth (configuration)
58+
59+
## KRM CaD
60+
61+
Our implementation of the Configuration as Data approach (
62+
[kpt](https://kpt.dev),
63+
[Config Sync](https://cloud.google.com/anthos-config-management/docs/config-sync-overview),
64+
and [Package Orchestration](https://github.com/GoogleContainerTools/kpt/tree/main/porch))
65+
build on the fuondation of
66+
[Kubernetes Resource Model](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md)
67+
(KRM).
68+
69+
**Note**: Even though KRM is not a requirement of Config as Data (just like
70+
Python or Go templates or Jinja are not specifically requirements for
71+
[IaC](https://en.wikipedia.org/wiki/Infrastructure_as_code)), the choice of
72+
another foundational config reporesentation format would necessitate
73+
implementing adapters for all types of infrastructure and applications
74+
configured, including Kubernetes, CRDs, GCP resources and more. Likewise, choice
75+
of another configuration format would require redesign of a number of the
76+
configuration management mechanisms that have already been designed for KRM,
77+
such as 3-way merge, structural merge patch, schema descriptions, resource
78+
metadata, references, status conventions, etc.
79+
80+
**KRM CaD** is therefore a specific approach to implementing *Configuration as
81+
Data* which:
82+
* uses [KRM](https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md)
83+
as the configuration serialization data model
84+
* uses [Kptfile](https://kpt.dev/reference/schema/kptfile/) to store package
85+
metadata
86+
* uses [ResourceList](https://kpt.dev/reference/schema/resource-list/) as a
87+
serialized packge wire-format
88+
* uses a function `ResourceList → ResultList` (`kpt` function) as the
89+
foundational, composable unit of package-manipulation code (note that other
90+
forms of code can manipulate packages as well, i.e. UIs, custom algorithms
91+
not necessarily packaged and used as kpt functions)
92+
93+
and provides the following basic functionality:
94+
95+
* load a serialized package from a repository (as `ResourceList`) (examples of
96+
repository may be one or more of: local HDD, Git repository, OCI, Cloud
97+
Storage, etc.)
98+
* save a serialized package (as `ResourceList`) to a package repository
99+
* evaluate a function on a serialized package (`ResourceList`)
100+
* [render](https://kpt.dev/book/04-using-functions/01-declarative-function-execution)
101+
a package (evaluate functions declared within the package itself)
102+
* create a new (empty) package
103+
* fork (or clone) an existing package from one package repository (called
104+
upstream) to another (called downstream)
105+
* delete a package from a repository
106+
* associate a version with the package; guarantee immutability of packages with
107+
an assigned version
108+
* incorporate changes from the new version of an upstream package into a new
109+
version of a downstream package
110+
* revert to a prior version of a package
111+
112+
## Value
113+
114+
The Config as Data approach enables some key value which is available in other
115+
configuration management approaches to a lesser extent or is not available
116+
at all.
117+
118+
*CaD* approach enables:
119+
120+
* simplified authoring of configuration using a variety of methods and sources
121+
* WYSIWYG interaction with configuration using a simple data serialization
122+
formation rather than a code-like format
123+
* layering of interoperable interface surfaces (notably GUI) over declarative
124+
configuration mechanisms rather than forcing choices between exclusive
125+
alternatives (exclusively UI/CLI or IaC initially followed by exclusively
126+
UI/CLI or exclusively IaC)
127+
* the ability to apply UX techniques to simplify configuration authoring and
128+
viewing
129+
* compared to imperative tools (e.g., UI, CLI) that directly modify the live
130+
state via APIs, CaD enables versioning, undo, audits of configuration history,
131+
review/approval, pre-deployment preview, validation, safety checks,
132+
constraint-based policy enforcement, and disaster recovery
133+
* bulk changes to configuration data in their sources of truth
134+
* injection of configuration to address horizontal concerns
135+
* merging of multiple sources of truth
136+
* state export to reusable blueprints without manual templatization
137+
* cooperative editing of configuration by humans and automation, such as for
138+
security remediation (which is usually implemented against live-state APIs)
139+
* reusability of configuration transformation code across multiple bodies of
140+
configuration data containing the same resource types, amortizing the effort
141+
of writing, testing, documenting the code
142+
* combination of independent configuration transformations
143+
* implementation of config transformations using the languages of choice,
144+
including both programming and scripting approaches
145+
* reducing the frequency of changes to existing transformation code
146+
* separation of roles between developer and non-developer configuration users
147+
* defragmenting the configuration transformation ecosystem
148+
* admission control and invariant enforcement on sources of truth
149+
* maintaining variants of configuration blueprints without one-size-fits-all
150+
full struct-constructor-style parameterization and without manually
151+
constructing and maintaining patches
152+
* drift detection and remediation for most of the desired state via continuous
153+
reconciliation using apply and/or for specific attributes via targeted
154+
mutation of the sources of truth
155+
156+
## Related Articles
157+
158+
For more information about Configuration as Data and Kubernetes Resource Model,
159+
visit the following links:
160+
161+
* [Rationale for kpt](https://kpt.dev/guides/rationale)
162+
* [Understanding Configuration as Data](https://cloud.google.com/blog/products/containers-kubernetes/understanding-configuration-as-data-in-kubernetes)
163+
blog post.
164+
* [Kubernetes Resource Model](https://cloud.google.com/blog/topics/developers-practitioners/build-platform-krm-part-1-whats-platform)
165+
blog post series

0 commit comments

Comments
 (0)