Releases: datahub-project/datahub
Release Candidate v0.8.28
Release Candidate for Version 0.8.28.
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
Full Changelog: v0.8.27...RC-v0.8.28
DataHub v0.8.27
Release Highlights
Notable UI-Based Features
-
The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.
-
Search for Entities by Owner - Easily filter search results by User/Group Owner
-
Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!
-
Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!
Notable Metadata Model & Ingestion-Based Features
-
ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!
-
Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!
-
Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!
-
BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.
Notable Docs Updates
-
NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl
-
Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.
What's Changed
- feat(deprecation): Entity Deprecation Backend by @jjoyce0510 in #4073
- Fixed auto complete pr coments by @Ankit-Keshari-Vituity in #4072
- fix(ingestion): enforce correct behaviour for commit policy by @claudio-benfatto in #4092
- fix(aggregate): Fix NPE in aggregate api by @dexter-mh-lee in #4095
- add Haibo corp by @wangqinghuan in #4082
- fix(ingestion): Add psutil dependency required for stateful ingestion reporting. by @rslanka in #4099
- docs(kafka): add example for using domains, change for clarity by @anshbansal in #4100
- feat(ui): Add display name & title to editable corp user properties. by @jjoyce0510 in #4097
- fix(ingestion): Enhance BigQuery source logging. by @rslanka in #4101
- fix(glossary terms): fix add glossary term flow by @gabe-lyons in #4106
- (docs) Add Zynga & Tableau logos by @maggiehays in #4109
- fix(ingestion): Add sql lineage to redshift-usage plugin by @dexter-mh-lee in #4103
- feat(ui): Add svg datahub satellite loading logo by @eburairu in #4067
- fix(ingestion): resolve oracle issue with large view definitions by @hsheth2 in #4027
- fix(ingest): ignore Postgres information_schema tables by default by @kevinhu in #4069
- fix(ingest) - close event loops in Okta source and add additional debug logging by @aditya-radhakrishnan in #4077
- chore(ingest): remove unused groupby_unsorted utility by @hsheth2 in #4011
- fix(docs): fixing metadata model doc generation script and updating png by @swaroopjagadish in #4120
- fix(ci): fix formatting in doc generation action yaml by @swaroopjagadish in #4121
- fix(ci): fix formatting for action yaml by @swaroopjagadish in #4122
- feat(Tags/Terms): Backend support for tag & term mutations by @jjoyce0510 in #4096
- docs(backup): add doc for taking backup by @anshbansal in #3917
- fix(docs): make intro to metadata ingestion easier for beginners by @anshbansal in #4039
- fix(ingest) Athena: db filter was not applied by @treff7es in #4127
- fix(ui) - move book logo to right of glossary term by @aditya-radhakrishnan in #4125
- fix(docs) Fix doc on modelDocUpload by @daha in #4112
- fix(cypress): force clicks on tag mutation test by @gabe-lyons in #4102
- feat(ingest) Athena: Getting table properties for Athena datasets by @treff7es in #4123
- fix(logging): Fix Restli Logging Filter to print full stack trace on error by @dexter-mh-lee in #4136
- docs : markdown fixes for db retention table by @satyamkrishna in #4133
- docs : markdown fixes for db retention table by @satyamkrishna in #4148
- feat(ingestion): Kafka stateful ingestion by @claudio-benfatto in #4028
- fix(docs): update graphql docs to reference new graphql file by @gabe-lyons in #4139
- Feature/oss/update to v2 endpoints by @RyanHolstien in #4128
- fix(cli): add timeout for telemetry calls by @anshbansal in #4135
- chore(cli): update default cli version pinned in the UI based ingestion by @anshbansal in #4150
- fix(docs): fix example of delta lake by @anshbansal in #4149
- fix(ui): Fix cutoff profiling axis labels by @jjoyce0510 in #4154
- feat(ingest): Glue - Support for domains and containers by @treff7es in #4110
- feat(ui): Host platform images on datahub-web-react by @ngamanda in #4118
- bug(seedData): adds a key to the root user seed data and fixes corner case check for missing key aspects by @RyanHolstien in #4162
- UI Fix: Modal close on Enter press, autofocus on modal, added split panel, alignment of button by @Ankit-Keshari-Vituity in #4155
- feat(ui): Edit glossary term descriptions via UI by @jjoyce0510 in #4156
- Update querying-entities.md -> Documentation Error by @buggythepirate in #4157
- refactor(metadata-io/test): common ElasticsearchContainer and ability to override from environment. by @stephenp-gr in #4152
- feat(ingestion): Add support for snowflake view lineage. by @rslanka in #4163
- Update the doc to including options to include Views by @cuong-pham in #4164
- fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. by @rslanka in #4140
- chore(ingestion): upgrade mypy by @hsheth2 in #4141
- ci(ingestion): fix airflow 1 deps for tox by @hsheth2 in #4083
- fix(ingest) Glue: Removing sqlalchemy dependency from glue by @treff7es in #4168
- fix(ingest) Athena: Generating propert containers for Athena by @treff7es in #4167
- Feature/users and groups UI updated as per new design by @ShubhamThakre in #4134
- chore(docs): various cleanup for docs-website by @hsheth2 in #4143
- bugfix(logging): reduce log noise from authentication chain by @RyanHolstien in #4173
- bug(glossaryTermLabels): fix glossary term labels missing and add cypress test by @RyanHolstien in #4171
- fixes(ui): Misc UI fixes + Adding Owners to Search Filters by @jjoyce0510 in #4175
- BugFixes/user-and-groups-minor-ui-fixes by @ShubhamThakre in #4181
- feat(groups): Adding editable group properties in the backend by @jjoyce0510 in #4166
- fix(python build): Pinning markupsafe by @treff7es in #4188
- feat(analytics): Improve analytics page by adding more charts regarding metadata ingested by @dexter-mh-lee in #4176
- docs(model): auto-generated docs and hand-written docs for the metada… by @swaroopjagadish in #4189
- minor fixes(ui): Small UI display fixes by @jjoyce0510 in #4190
- fix(ui): Return empty search response on invalid characters in search by @jjoyce0510 in #4193
- refactor(spark-line...
DataHub v0.8.26
This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.
Release Highlights
- Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.
DataHub v0.8.25
Known Issues
- Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.
Release Highlights
Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.
Notable UI-Based Features
- UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
- Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
- Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.
Notable Metadata Model & Ingestion-Based Features
- Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
- Avro files are now supported in the Data Lake File ingestion source
- Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the
datahub
migrate command to migrate them over to platform instances. - Ignore users from Top Users calculation
- BigQuery - Data Profiling on only the latest partition/shard
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
Notable Fixes
- Fix to support
View in Looker
* feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985 - fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
DataHub Usage Guides
- docs(domains): Adding a User Guide for Domains by @jjoyce0510 in #4038
- docs(ingest): Adding UI ingestion guide by @jjoyce0510 in #4048
What's Changed
- fix(vulnerability): Upgrade gms base image by @dexter-mh-lee in #3962
- logging(frontend): Improve OIDC debug logs by @jjoyce0510 in #3967
- docs(delete): add curl request example to delete entity by @anshbansal in #3928
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
- Feature/dynamic platform icons by @RyanHolstien in #3968
- refactor(ingestion): remove duplicate aspect type by @hsheth2 in #3972
- fix(example): fix typo by @anshbansal in #3907
- fix(ingestion): Restrict python to <=3.9.9 by @treff7es in #3961
- feat(build): remove requirement for git directory for builds by @swaroopjagadish in #3977
- fix(ingestion): tighten conditions for restli json transformation by @hsheth2 in #3973
- fix(ingestion): don't dump variables for config errors by @hsheth2 in #3974
- Bugfix/increase socket timeout by @RyanHolstien in #3982
- feat(ingest): support for Avro data lake files by @kevinhu in #3913
- fix(build): exclude old log4j core by @RickardCardell in #3966
- fix(quickstart): Pin Quickstart version to v0.8.23. by @jjoyce0510 in #3983
- feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
- fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(quickstart): Assign correct mysql-setup container for M1 and remove "head" default version. by @jjoyce0510 in #3987
- feat(embedded search results): support custom endpoints in embedded search result by @gabe-lyons in #3986
- fix(docker): datahub-gms - build in native, copy to target by @swaroopjagadish in #3992
- fix(ci): moving defaults back to head now that docker builds are green by @swaroopjagadish in #3993
- feat(ui): UI-based ingestion (as featured in Dec Townhall) by @jjoyce0510 in #3975
- quickstart: Adding UI ingestion to quickstart YAML by @jjoyce0510 in #3994
- feat(domains): Adding backend for Asset Domains (p1) by @jjoyce0510 in #3952
- Bug: a bug fix to bigquery_to_datahub.yml file by @dipeshmaurya in #3988
- fix(ingest): check if feature data type is present by @maaaikoool in #3932
- feat(platform-instance): a simple client-only change to support platf… by @swaroopjagadish in #3996
- docs(metadata-model): Adding to Metadata model docs by @jjoyce0510 in #3998
- Add Stash Logo & new Source Icons by @maggiehays in #4002
- feat(domains): UI for Asset Domains (p2) by @jjoyce0510 in #3995
- docs: add missing back tick for metadata-ingestion/README.md by @nickwu241 in #4003
- Bugfix/add missing classes by @RyanHolstien in #4000
- fix(superset): fix connection for redshift by @anshbansal in #3944
- fix(setup): fix setup for M1 by @anshbansal in #3958
- docs:add Optum logo by @maggiehays in #4005
- Refining Metadata Model docs further by @jjoyce0510 in #4001
- fix(docker): Alpine based multiplatform docker build for kafka-setup by @treff7es in #3991
- Bugfix/graph concurrency issue by @RyanHolstien in #4007
- feat(ingest): Add additional snowflake auth by @MikeSchlosser16 in #4009
- fix(ci): Reverting unnecessary domain test changes by @jjoyce0510 in #4013
- fix(metrics): Add metrics for mcl hooks by @dexter-mh-lee in #4008
- feat(platform) - Update FabricType enum to represent more fabrics by @aditya-radhakrishnan in #3997
- feat(ingest): emit flags and stats for profiling telemetry by @kevinhu in #3969
- fix(formatting): fix linting lib version requirement by @anshbansal in #3939
- fix(docs): fix business glossary docs by @anshbansal in #3916
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(docs): update gms link by @lhvubtqn in #3927
- fix(ingest): lint fix a few files by @swaroopjagadish in #4016
- fix(ingest): adding platform instance urn to data platform instance aspects by @swaroopjagadish in #4015
- feat(ingest): use trino python client for sqlalchemy, supports python… by @mayurinehate in #3888
- fix(spark-lineage): select mock server port dynamically for unit test by @MugdhaHardikar-GSLab in #4018
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
- Test/add concurrency issue smoke test by @RyanHolstien in #4014
- feat(glossary-terms): Index glossary term custom properties by @jjoyce0510 in #3960
- feat(ingestion): Adding ability to ignore users from top users...
DataHub v0.8.24
Release Highlights
- Adding support for nested Glue schemas
- Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
- Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
- Miscellaneous bug fixes & improvements
What's Changed
- fix(workflow) docker-ingestion is failing bc of an invalid sed command by @dexter-mh-lee in #3896
- refactor(graphql): Migrating Datasets, Charts, Dashboards, Jobs, Flows to Entity V2 endpoint by @jjoyce0510 in #3897
- fix(ingest): populate system metadata for all metadata events (mcp, mcpw) by @swaroopjagadish in #3900
- perf: add/change scripts for tests by @anshbansal in #3840
- fix(glossary): owner should be optional as per docs by @anshbansal in #3858
- feat(ingestion): Support for nested glue schemas by @rslanka in #3895
- docs: change roadmap link by @jeffmerrick in #3904
- feat(kafka): support confluent references by @anshbansal in #3862
- docs (elasticsearch): config error by @JIWEI0 in #3901
- feat(ingestion): Data lake profiling by @kevinhu in #3656
- refactor(search): refactor NUM_RETRIES in esindexbuilder to be configurable by @senni0418 in #3870
- fix(ingest): nifi - replace hardcode password with config variable by @lhvubtqn in #3902
- feat(authentication): propagate expired token exceptions to end user by @gabe-lyons in #3894
- fix(docs): update data lake docs with path_spec details by @kevinhu in #3905
- ci(smoke-test): make tags&terms smoke test wait for ingestion to complete by @gabe-lyons in #3812
- Revert "fix(glossary): owner should be optional as per docs (#3858)" by @anshbansal in #3910
- fix(ingest): operational stats - check if optional fields are present by @aditya-radhakrishnan in #3911
- fix(typo): fix typo in docs by @anshbansal in #3908
- refactor(gql/ui): Misc refactorings by @jjoyce0510 in #3921
- feat(config): make check for frontend instead of gms more robust by @anshbansal in #3919
- feat(spark-lineage): simplified jars, config, auto publish to maven by @swaroopjagadish in #3924
- Bugfix/telemetry soft fail by @RyanHolstien in #3934
- fix(log): fix log levels and formats by @anshbansal in #3943
- docs(metadata-ingestion): fix command for running fast unit tests by @anshbansal in #3942
- fix(ui): update login title css to fit on one line by @aditya-radhakrishnan in #3922
- fix(docs): Clarify available no-code rendering formats in DataQualityRules.pdl by @gabe-lyons in #3912
- docs(links): add links to some recent case studies and blog posts by @anshbansal in #3941
- fix(docs): fix openapi docs by @anshbansal in #3940
- Adding Snappy Lib and JKS File by @arunvasudevan in #3898
- Feature/Issue resolved- Improve table stats readability in UI by @ShubhamThakre in #3889
- refactor(ui): Allow DocumentationTab to optionally use updateDescription mutation by @jjoyce0510 in #3935
- (docs)add moloco logo by @maggiehays in #3945
- refactor(bootstrap data): Add usage and profiles to bootstrap_mce.json by @jjoyce0510 in #3947
- docs(metadata): update relationship query in docs by @gabe-lyons in #3951
- fix(ingestion): Snowflake Usage should continue to emit usage workunits with include_operational_stats enabled. by @rslanka in #3949
- feat(ingestion): Add support for extracting S3->Snowflake and S3->Glue lineages. by @rslanka in #3946
- fix(graphQL): Fixing set ordering in batchGet of entities by @jjoyce0510 in #3950
- feat(elastic-search): changing default bulk index request batch to 1000 by @swaroopjagadish in #3957
- docs (metadata modeling): Fix broken links and doc fixes by @arunvasudevan in #3954
New Contributors
- @JIWEI0 made their first contribution in #3901
- @senni0418 made their first contribution in #3870
- @lhvubtqn made their first contribution in #3902
- @ShubhamThakre made their first contribution in #3889
Full Changelog: v0.8.23...v0.8.24
DataHub v0.8.23
Release Highlights
- Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
- Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
- Robustness improvements to DataHub Java Client Package
- Introducing a new Elasticsearch ingestion connector!
- Misc bug fixes & improvements.
What's Changed
- build: include correct version in metadata-ingestion docker image by @hsheth2 in #3857
- fix(metabase): fix crashes on missing values by @iasoon in #3859
- fix(datahub-client): fix shadow jar build, correct spark-lineage url … by @swaroopjagadish in #3871
- feat(git-version): Add version to the UI and config endpoint by @dexter-mh-lee in #3866
- fix(build): fix shadow jar checker to allow new git.properties by @swaroopjagadish in #3875
- feat(metadata-ingestion): Make datahub-rest client more robust by configurable retries. (#3826) by @RickardCardell in #3860
- fix(github-workflow): Remove duplicate context in kafka setup workflow by @dexter-mh-lee in #3876
- docs(azure-ad): correct default value for username attr by @iasoon in #3861
- docs: fix endpoint URL by @anshbansal in #3852
- fix(cli): disable telemetry in CLI tests by @kevinhu in #3877
- feat(metabase): allow configuring how database engines get mapped to platforms by @iasoon in #3869
- doc(graphql): add some examples by @anshbansal in #3867
- fix(search): Fix issue with filters and autocomplete by @dexter-mh-lee in #3868
- fix(build): remove jcenter from gradle build by @aditya-radhakrishnan in #3882
- (docs)Roadmap, Townhall, & Feature Request link updates by @maggiehays in #3873
- doc(kafka): add permissions required for confluent cloud by @anshbansal in #3850
- feat(ingest): ingestion-specific telemetry by @kevinhu in #3881
- Add AWS MSK Iam Auth Jar to GMS by @arunvasudevan in #3872
- docs(ingestion) azure: specify required permission type by @iasoon in #3886
- feat(ingestion) dbt: support spark sql types by @iasoon in #3880
- update dependency for bigquery. by @varunbharill in #3874
- fix(field-extraction): Fix extraction for unions by @dexter-mh-lee in #3892
- fix(ingest): sqlparser - Not lowercasing looker source's special table name by @treff7es in #3891
- feat(ingest): Support for spectrum external array types by @treff7es in #3890
- feat(Ingestion): Add Elasticsearch Source by @rslanka in #3893
Full Changelog: v0.8.22...v0.8.23
DataHub v0.8.22
Disclaimers!
- Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.
Release Highlights:
- Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
- Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
- Data freshness indication via Last Updated Timestamp.
- Improvements to data profiling performance and lineage extraction
What's Changed
- feat(snowflake-usage): Generate email address if not exists by @treff7es in #3791
- feat(java datahub-client): add Java REST emitter by @MugdhaHardikar-GSLab in #3781
- fix(docker): Fix path to elastic definition in dev docker compose by @MikeSchlosser16 in #3808
- feat(nocode): Add get entities v2 endpoint that can get without snapshot by @dexter-mh-lee in #3738
- docs(modeling): Add a link to MXE page inside the Metadata Modeling page by @pramodbiligiri in #3765
- docs(fix): fix broken reference by @RyanHolstien in #3814
- feat(java-emitter): improvements to builder API-s, moving spark-linea… by @swaroopjagadish in #3819
- fix(ingestion): Make url an optional field of the DefaultConfig for business glossary by @rslanka in #3817
- fix(ingest): Handle string redshift type by @treff7es in #3811
- feat(gms): add schema registry support for tls in gms by @MikeSchlosser16 in #3804
- Add table, changed formatting and wording by @dannylee8 in #3802
- feat(mae/mcl): Make ingestAspect produce both MCLs and MAEs by @dexter-mh-lee in #3737
- docs(confluent): Add new topic names by @anshbansal in #3825
- (feat)(glossary): Increase number of autocomplete results shown to 25 by @aditya-radhakrishnan in #3821
- feat(sql-parser): Replacing sqlmetadata sql parser lib with sqlineage parser lib by @treff7es in #3806
- feat(profiler): using approximate queries for profiling by @treff7es in #3752
- docs: improve docs for kafka configuration by @abiwill in #3828
- test(fixEbeanEntityServiceTest): fix bug on verification for EbeanEntityService by @RyanHolstien in #3829
- fix(ingest): ignore custom connectors for Glue ingestion by @kevinhu in #3805
- fix(java-emitter): check for null callback by @swaroopjagadish in #3830
- feat(dbt-meta): add support for dbt meta mapping by @swaroopjagadish in #3832
- fix(ingestion): Fix the datetime parsing issue in the metabase source. by @rslanka in #3831
- feat(removeGMA): remove all dependencies on gma libraries by @RyanHolstien in #3835
- perf(ingest): changes to improve ingest performance a bit by @anshbansal in #3837
- fix(azure AD): fix problem with missing key causing failures in ingestion by @anshbansal in #3824
- docs: fix typo by @anshbansal in #3848
- docs(cli): fix wrong heading, add link to release notes by @anshbansal in #3700
- feat(ci): split metadata-ingestion ci to streamline build by @swaroopjagadish in #3854
- fix(dbt): fix warning due to struct type not being mapped by @anshbansal in #3846
- fix(ingest): bigquery-usage - fix remove_extras to remove all partitions by @gfalcone in #3842
- fix(ingestion): handle database=None for dbt ingestion by @iasoon in #3851
- feat(ingest): last updated - show last updated for sql usage sources by @aditya-radhakrishnan in #3845
- feat(lineage): allow for expanding of lineage node titles in the lineage explorer by @gabe-lyons in #3856
New Contributors
- @MikeSchlosser16 made their first contribution in #3808
- @pramodbiligiri made their first contribution in #3765
- @aditya-radhakrishnan made their first contribution in #3821
- @abiwill made their first contribution in #3828
- @gfalcone made their first contribution in #3842
- @iasoon made their first contribution in #3851
Full Changelog: v0.8.21...v0.8.22
v0.8.21
This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.
Release Highlights
- Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
- Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
- Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.
What's Changed
- fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
- fix(react-ui): fix header min height by @gabe-lyons in #3784
- docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
- Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
- feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
- feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
- Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
- docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
- fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
- doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
- feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
- fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
- docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789
New Contributors
- @cccs-eric made their first contribution in #3780
Full Changelog: v0.8.20...v0.8.21
v0.8.20
This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.
Release Highlights
- Configurable aspect retention in application.yml (disabled by default)
- Metabase Ingestion Source connector
- Constrain log4j to version 0.2.17
- Upgrade logback to 1.2.9
What's Changed
- feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
- feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
- feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
- feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
- feat(ingest): cleanup deprecated
datahub.integrations.airflow.*
imports by @hsheth2 in #3732 - feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
- fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
- feat(perf-test): changes for perf testing by @anshbansal in #3728
- ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
- (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
- Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
- fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
- fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
- fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
- feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
- fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
- refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
- test(ingest): fix pytest warning for class starting with
Test
by @hsheth2 in #3745 - feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
- fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
- feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
- Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
- build(ingest): restrict latest mypy version by @hsheth2 in #3756
- doc: Add IOMED as a DataHub adopter by @merqurio in #3758
- docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
- feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
- feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
- feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
- refactor(test): replace
CliRunner
withrun_datahub_cmd
method by @hsheth2 in #3746 - feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
- feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
- Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
- fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
- docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
- docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
- Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
- fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
- fix(ingest): fix compatibility with google composer by @anshbansal in #3774
Known Issues
We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.
New Contributors
- @MugdhaHardikar-GSLab made their first contribution in #3664
- @jawadqu made their first contribution in #3602
- @nsbala-tw made their first contribution in #3733
- @merqurio made their first contribution in #3758
- @hyunminch made their first contribution in #3680
- @sudotty made their first contribution in #3227
- @xiphl made their first contribution in #3772
Full Changelog: v0.8.19...v0.8.20
v0.8.19
This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.
Release Highlights
- Fix
base64
cli command issue where some systems do not have it. - Fix usage user extraction where email domain repeated twice.
What's Changed
- fix(recommendations): don't show a
0
character when there are no suggestions by @gabe-lyons in #3720 - fix(mode): support definitions in mode query by @gabe-lyons in #3721
- fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
- docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
- fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
- fix(ingest): get mysql geotypes properly by @treff7es in #3726
- fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
- feat(ingest) Trim long sql queries in usage by @treff7es in #3725
- fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
- fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
- fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
- feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
- fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714
New Contributors
- @lvicentesanchez made their first contribution in #3702
- @grumbler made their first contribution in #3714
Full Changelog: v0.8.18...v0.8.19