Skip to content

Commit 90b176c

Browse files
add ability to download cached workspace (#520)
* create "stale" field on workspace state A provider that downloads its workspace state directly cannot assume that this state is a valid basis for a future incremental update, and should mark the downloaded workspace as stale. Signed-off-by: Will Murphy <[email protected]> * WIP add configs Signed-off-by: Will Murphy <[email protected]> * lint fix Signed-off-by: Will Murphy <[email protected]> * [wip] working on vunnel results db listing Signed-off-by: Alex Goodman <[email protected]> * update and tests for safe_extract_tar Now that we're using it for more than one thing, make an extractor that generally prevents path traversal. Signed-off-by: Will Murphy <[email protected]> * [wip] adding tests for fetching listing and archives Signed-off-by: Alex Goodman <[email protected]> * [wip] add more negative tests for provider tests Signed-off-by: Alex Goodman <[email protected]> * unit test for new workspace changes Signed-off-by: Will Murphy <[email protected]> * replace the workspace results instead of overlaying Signed-off-by: Will Murphy <[email protected]> * clean up hasher implementation Signed-off-by: Alex Goodman <[email protected]> * add tests for prep workspace from listing entry Signed-off-by: Will Murphy <[email protected]> * do not include inputs in tar test fixture Signed-off-by: Alex Goodman <[email protected]> * vunnel fetch existing workspace working Signed-off-by: Will Murphy <[email protected]> * add unit test for full update flow Signed-off-by: Will Murphy <[email protected]> * update existing unit tests for new config values Signed-off-by: Will Murphy <[email protected]> * add unit test for default behavior of new configs Signed-off-by: Will Murphy <[email protected]> * lint fix Signed-off-by: Will Murphy <[email protected]> * add missing annotations import Signed-off-by: Will Murphy <[email protected]> * Use 3.9 compatible annotations Relying on the from __future__ import annotations doesn't work with the mashumaro. Signed-off-by: Will Murphy <[email protected]> * validate that enabling import results requires host and path Signed-off-by: Will Murphy <[email protected]> * rename listing field and add schema Signed-off-by: Alex Goodman <[email protected]> * only require github token when downloading Signed-off-by: Alex Goodman <[email protected]> * add zstd support Signed-off-by: Alex Goodman <[email protected]> * add tests for zstd support Signed-off-by: Alex Goodman <[email protected]> * add tests for _has_newer_archive Signed-off-by: Will Murphy <[email protected]> * fix tests for zstd Signed-off-by: Alex Goodman <[email protected]> * show stderr to log when git commands fail Signed-off-by: Alex Goodman <[email protected]> * move import_results to common field on provider Signed-off-by: Will Murphy <[email protected]> * add concept for distribution version Signed-off-by: Alex Goodman <[email protected]> * single source of truth for provider schemas Signed-off-by: Alex Goodman <[email protected]> * add distribution-version to schema, provider state, and listing entry Signed-off-by: Alex Goodman <[email protected]> * clear workspace on different dist version Signed-off-by: Alex Goodman <[email protected]> * fix defaulting logic and update tests Signed-off-by: Will Murphy <[email protected]> * default distribution version and path Signed-off-by: Will Murphy <[email protected]> * make "" and None both use default path Signed-off-by: Will Murphy <[email protected]> --------- Signed-off-by: Will Murphy <[email protected]> Signed-off-by: Alex Goodman <[email protected]> Co-authored-by: Alex Goodman <[email protected]>
1 parent 6b4fa38 commit 90b176c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1967
-127
lines changed

poetry.lock

Lines changed: 148 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ importlib-metadata = "^7.0.1"
5757
xsdata = {extras = ["cli", "lxml", "soap"], version = ">=22.12,<25.0"}
5858
pytest-snapshot = "^0.9.0"
5959
mashumaro = "^3.10"
60+
iso8601 = "^2.1.0"
61+
zstandard = "^0.22.0"
6062

6163
[tool.poetry.group.dev.dependencies]
6264
pytest = ">=7.2.2,<9.0.0"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# `ProviderState` JSON Schema
2+
3+
This schema governs the `listing.json` file used when providers are configured to fetch pre-computed results (by using `import_results_enabled`). The listing file is how the provider knows what results are available, where to fetch them from, and how to validate them.
4+
5+
See `src/vunnel.distribution.Listing` for the root object that represents this schema.
6+
7+
## Updating the schema
8+
9+
Versioning the JSON schema must be done manually by copying the existing JSON schema into a new `schema-x.y.z.json` file and manually making the necessary updates (or by using an online tool such as https://www.liquid-technologies.com/online-json-to-schema-converter).
10+
11+
This schema is being versioned based off of the "SchemaVer" guidelines, which slightly diverges from Semantic Versioning to tailor for the purposes of data models.
12+
13+
Given a version number format `MODEL.REVISION.ADDITION`:
14+
15+
- `MODEL`: increment when you make a breaking schema change which will prevent interaction with any historical data
16+
- `REVISION`: increment when you make a schema change which may prevent interaction with some historical data
17+
- `ADDITION`: increment when you make a schema change that is compatible with all historical data
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
{
2+
"$schema": "http://json-schema.org/draft-04/schema#",
3+
"type": "object",
4+
"properties": {
5+
"schema": {
6+
"type": "object",
7+
"properties": {
8+
"version": {
9+
"type": "string"
10+
},
11+
"url": {
12+
"type": "string"
13+
}
14+
},
15+
"required": [
16+
"version",
17+
"url"
18+
]
19+
},
20+
"provider": {
21+
"type": "string"
22+
},
23+
"available": {
24+
"type": "object",
25+
"properties": {
26+
"1": {
27+
"type": "array",
28+
"items": [
29+
{
30+
"type": "object",
31+
"properties": {
32+
"distribution_checksum": {
33+
"type": "string"
34+
},
35+
"built": {
36+
"type": "string"
37+
},
38+
"checksum": {
39+
"type": "string"
40+
},
41+
"url": {
42+
"type": "string"
43+
},
44+
"version": {
45+
"type": "integer"
46+
}
47+
},
48+
"required": [
49+
"built",
50+
"checksum",
51+
"distribution_checksum",
52+
"url",
53+
"version"
54+
]
55+
}
56+
]
57+
}
58+
}
59+
}
60+
},
61+
"required": [
62+
"schema",
63+
"available",
64+
"provider"
65+
]
66+
}
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
{
2+
"$schema": "http://json-schema.org/draft-04/schema#",
3+
"type": "object",
4+
"title": "provider-workspace-state",
5+
"description": "describes the filesystem state of a provider workspace directory",
6+
"properties": {
7+
"provider": {
8+
"type": "string"
9+
},
10+
"urls": {
11+
"type": "array",
12+
"items": [
13+
{
14+
"type": "string"
15+
}
16+
]
17+
},
18+
"store": {
19+
"type": "string"
20+
},
21+
"timestamp": {
22+
"type": "string"
23+
},
24+
"listing": {
25+
"type": "object",
26+
"properties": {
27+
"digest": {
28+
"type": "string"
29+
},
30+
"path": {
31+
"type": "string"
32+
},
33+
"algorithm": {
34+
"type": "string"
35+
}
36+
},
37+
"required": [
38+
"digest",
39+
"path",
40+
"algorithm"
41+
]
42+
},
43+
"version": {
44+
"type": "integer",
45+
"description": "version describing the result data shape + the provider processing behavior semantics"
46+
},
47+
"distribution_version": {
48+
"type": "integer",
49+
"description": "version describing purely the result data shape"
50+
},
51+
"schema": {
52+
"type": "object",
53+
"properties": {
54+
"version": {
55+
"type": "string"
56+
},
57+
"url": {
58+
"type": "string"
59+
}
60+
},
61+
"required": [
62+
"version",
63+
"url"
64+
]
65+
},
66+
"stale": {
67+
"type": "boolean",
68+
"description": "set to true if the workspace is stale and cannot be used for an incremental update"
69+
}
70+
},
71+
"required": [
72+
"provider",
73+
"urls",
74+
"store",
75+
"timestamp",
76+
"listing",
77+
"version",
78+
"schema"
79+
]
80+
}

src/vunnel/cli/config.py

Lines changed: 53 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,41 @@
22

33
import os
44
from dataclasses import dataclass, field, fields
5-
from typing import Any
5+
from typing import TYPE_CHECKING, Any
6+
7+
if TYPE_CHECKING:
8+
from collections.abc import Generator
69

710
import mergedeep
811
import yaml
912
from mashumaro.mixins.dict import DataClassDictMixin
1013

11-
from vunnel import providers
14+
from vunnel import provider, providers
15+
16+
17+
@dataclass
18+
class ImportResults:
19+
"""
20+
These are the defaults for all providers. Corresponding
21+
fields on specific providers override these values.
22+
23+
If a path is "" or None, path will be set to "providers/{provider_name}/listing.json".
24+
If an empty path is needed, specify "/".
25+
"""
26+
27+
__default_path__ = "providers/{provider_name}/listing.json"
28+
host: str = ""
29+
path: str = __default_path__
30+
enabled: bool = False
31+
32+
def __post_init__(self) -> None:
33+
if not self.path:
34+
self.path = self.__default_path__
35+
36+
37+
@dataclass
38+
class CommonProviderConfig:
39+
import_results: ImportResults = field(default_factory=ImportResults)
1240

1341

1442
@dataclass
@@ -26,12 +54,32 @@ class Providers:
2654
ubuntu: providers.ubuntu.Config = field(default_factory=providers.ubuntu.Config)
2755
wolfi: providers.wolfi.Config = field(default_factory=providers.wolfi.Config)
2856

57+
common: CommonProviderConfig = field(default_factory=CommonProviderConfig)
58+
59+
def __post_init__(self) -> None:
60+
for name in self.provider_names():
61+
runtime_cfg = getattr(self, name).runtime
62+
if runtime_cfg and isinstance(runtime_cfg, provider.RuntimeConfig):
63+
if runtime_cfg.import_results_enabled is None:
64+
runtime_cfg.import_results_enabled = self.common.import_results.enabled
65+
if not runtime_cfg.import_results_host:
66+
runtime_cfg.import_results_host = self.common.import_results.host
67+
if not runtime_cfg.import_results_path:
68+
runtime_cfg.import_results_path = self.common.import_results.path
69+
2970
def get(self, name: str) -> Any | None:
30-
for f in fields(Providers):
31-
if self._normalize_name(f.name) == self._normalize_name(name):
32-
return getattr(self, f.name)
71+
for candidate in self.provider_names():
72+
if self._normalize_name(candidate) == self._normalize_name(name):
73+
return getattr(self, candidate)
3374
return None
3475

76+
@staticmethod
77+
def provider_names() -> Generator[str, None, None]:
78+
for f in fields(Providers):
79+
if f.name == "common":
80+
continue
81+
yield f.name
82+
3583
@staticmethod
3684
def _normalize_name(name: str) -> str:
3785
return name.lower().replace("-", "_")

src/vunnel/distribution.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
from __future__ import annotations
2+
3+
import datetime
4+
import os
5+
from dataclasses import dataclass, field
6+
from urllib.parse import urlparse
7+
8+
import iso8601
9+
from mashumaro.mixins.dict import DataClassDictMixin
10+
11+
from vunnel import schema as schema_def
12+
13+
DB_SUFFIXES = {".tar.gz", ".tar.zst"}
14+
15+
16+
@dataclass
17+
class ListingEntry(DataClassDictMixin):
18+
# the date this archive was built relative to the data enclosed in the archive
19+
built: str
20+
21+
# the URL where the vunnel provider archive is located
22+
url: str
23+
24+
# the digest of the archive referenced at the URL.
25+
# Note: all checksums are labeled with "algorithm:value" ( e.g. sha256:1234567890abcdef1234567890abcdef)
26+
distribution_checksum: str
27+
28+
# the digest of the checksums file within the archive referenced at the URL
29+
# Note: all checksums are labeled with "algorithm:value" ( e.g. xxhash64:1234567890abcdef)
30+
enclosed_checksum: str
31+
32+
# the provider distribution version this archive was built with (different than the provider version)
33+
distribution_version: int = 1
34+
35+
def basename(self) -> str:
36+
basename = os.path.basename(urlparse(self.url, allow_fragments=False).path)
37+
if not _has_suffix(basename, suffixes=DB_SUFFIXES):
38+
msg = f"entry url is not a db archive: {basename}"
39+
raise RuntimeError(msg)
40+
41+
return basename
42+
43+
def age_in_days(self, now: datetime.datetime | None = None) -> int:
44+
if not now:
45+
now = datetime.datetime.now(tz=datetime.timezone.utc)
46+
return (now - iso8601.parse_date(self.built)).days
47+
48+
49+
@dataclass
50+
class ListingDocument(DataClassDictMixin):
51+
# mapping of provider versions to a list of ListingEntry objects denoting archives available for download
52+
available: dict[int, list[ListingEntry]]
53+
54+
# the provider name this document is associated with
55+
provider: str
56+
57+
# the schema information for this document
58+
schema: schema_def.Schema = field(default_factory=schema_def.ProviderListingSchema)
59+
60+
@classmethod
61+
def new(cls, provider: str) -> ListingDocument:
62+
return cls(available={}, provider=provider)
63+
64+
def latest_entry(self, schema_version: int) -> ListingEntry | None:
65+
if schema_version not in self.available:
66+
return None
67+
68+
if not self.available[schema_version]:
69+
return None
70+
71+
return self.available[schema_version][0]
72+
73+
def add(self, entry: ListingEntry) -> None:
74+
if not self.available.get(entry.distribution_version):
75+
self.available[entry.distribution_version] = []
76+
77+
self.available[entry.distribution_version].append(entry)
78+
79+
# keep listing entries sorted by date (rfc3339 formatted entries, which iso8601 is a superset of)
80+
self.available[entry.distribution_version].sort(
81+
key=lambda x: iso8601.parse_date(x.built),
82+
reverse=True,
83+
)
84+
85+
86+
def _has_suffix(el: str, suffixes: set[str] | None) -> bool:
87+
if not suffixes:
88+
return True
89+
return any(el.endswith(s) for s in suffixes)

0 commit comments

Comments
 (0)