Skip to content

Releases: datahub-project/datahub

Release Candidate v0.8.28

05 Mar 00:14
18dd5b6
Compare
Choose a tag to compare
Pre-release

Release Candidate for Version 0.8.28.

What's Changed

New Contributors

Full Changelog: v0.8.27...RC-v0.8.28

DataHub v0.8.27

23 Feb 19:44
49a8ece
Compare
Choose a tag to compare

Release Highlights

Notable UI-Based Features

  • The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.

  • Search for Entities by Owner - Easily filter search results by User/Group Owner

  • Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!

  • Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!

Notable Metadata Model & Ingestion-Based Features

  • ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!

  • Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!

  • Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!

  • BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.

Notable Docs Updates

  • NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl

  • Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.

What's Changed

Read more

DataHub v0.8.26

08 Feb 23:22
3668de8
Compare
Choose a tag to compare

This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.

Release Highlights

  • Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.

DataHub v0.8.25

07 Feb 22:32
ec062b6
Compare
Choose a tag to compare

Known Issues

  • Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.

Release Highlights

Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.

Notable UI-Based Features

  • UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
  • Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
  • Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.

Notable Metadata Model & Ingestion-Based Features

  • Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
  • Avro files are now supported in the Data Lake File ingestion source
  • Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the datahub migrate command to migrate them over to platform instances.
  • Ignore users from Top Users calculation
    • feat(ingestion): Adding ability to ignore users from top users calculation by @treff7es in #3735
  • BigQuery - Data Profiling on only the latest partition/shard
    • feat(ingestion) bigquery: Profiling only the latest partition/shard on bigquery by @treff7es in #3930
  • (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813

Notable Fixes

  • Fix to support View in Looker * feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
  • fix(graphql): support group display name in ownership by @thomasplarsson in #3979
  • fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
  • fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926

DataHub Usage Guides

What's Changed

Read more

DataHub v0.8.24

24 Jan 21:42
f2e2a4d
Compare
Choose a tag to compare

Release Highlights

  • Adding support for nested Glue schemas
  • Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
  • Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
  • Miscellaneous bug fixes & improvements

What's Changed

New Contributors

Full Changelog: v0.8.23...v0.8.24

DataHub v0.8.23

14 Jan 23:06
a44b48a
Compare
Choose a tag to compare

Release Highlights

  • Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
  • Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
  • Robustness improvements to DataHub Java Client Package
  • Introducing a new Elasticsearch ingestion connector!
  • Misc bug fixes & improvements.

What's Changed

Full Changelog: v0.8.22...v0.8.23

DataHub v0.8.22

09 Jan 00:59
bb0943f
Compare
Choose a tag to compare

Disclaimers!

  • Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.

Release Highlights:

  • Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
  • Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
  • Data freshness indication via Last Updated Timestamp.
  • Improvements to data profiling performance and lineage extraction

What's Changed

New Contributors

Full Changelog: v0.8.21...v0.8.22

v0.8.21

28 Dec 19:37
895af09
Compare
Choose a tag to compare

This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.

Release Highlights

  • Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
  • Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
  • Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.

What's Changed

  • fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
  • fix(react-ui): fix header min height by @gabe-lyons in #3784
  • docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
  • Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
  • feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
  • feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
  • Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
  • docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
  • fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
  • doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
  • feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
  • fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
  • docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789

New Contributors

Full Changelog: v0.8.20...v0.8.21

v0.8.20

20 Dec 22:35
77e3641
Compare
Choose a tag to compare

This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.

Release Highlights

  • Configurable aspect retention in application.yml (disabled by default)
  • Metabase Ingestion Source connector
  • Constrain log4j to version 0.2.17
  • Upgrade logback to 1.2.9

What's Changed

  • feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
  • feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
  • feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
  • feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
  • feat(ingest): cleanup deprecated datahub.integrations.airflow.* imports by @hsheth2 in #3732
  • feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
  • fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
  • feat(perf-test): changes for perf testing by @anshbansal in #3728
  • ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
  • (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
  • Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
  • fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
  • fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
  • fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
  • feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
  • fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
  • refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
  • test(ingest): fix pytest warning for class starting with Test by @hsheth2 in #3745
  • feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
  • fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
  • feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
  • Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
  • build(ingest): restrict latest mypy version by @hsheth2 in #3756
  • doc: Add IOMED as a DataHub adopter by @merqurio in #3758
  • docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
  • feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
  • feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
  • feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
  • refactor(test): replace CliRunner with run_datahub_cmd method by @hsheth2 in #3746
  • feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
  • feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
  • Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
  • fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
  • docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
  • docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
  • Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
  • fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
  • fix(ingest): fix compatibility with google composer by @anshbansal in #3774

Known Issues

We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.

New Contributors

Full Changelog: v0.8.19...v0.8.20

v0.8.19

13 Dec 19:13
83207b3
Compare
Choose a tag to compare

This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.

Release Highlights

  • Fix base64 cli command issue where some systems do not have it.
  • Fix usage user extraction where email domain repeated twice.

What's Changed

  • fix(recommendations): don't show a 0 character when there are no suggestions by @gabe-lyons in #3720
  • fix(mode): support definitions in mode query by @gabe-lyons in #3721
  • fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
  • docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
  • fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
  • fix(ingest): get mysql geotypes properly by @treff7es in #3726
  • fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
  • feat(ingest) Trim long sql queries in usage by @treff7es in #3725
  • fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
  • fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
  • fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
  • feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
  • fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714

New Contributors

Full Changelog: v0.8.18...v0.8.19