Core Functionality Updates for the LakeSoul Lake Warehouse Framework
1. LakeSoul NativeIO performance has been significantly optimized, including adjustments to write file compression and dictionary encoding algorithms, and optimizations to key Merge on Read code paths, resulting in a doubling of both read and write performance(compared to 2.6 version).
2. LakeSoul NativeIO has added a local hot data caching feature. This allows remote object storage files to be cached on local disk, significantly improving the performance of MPP queries and other queries. Local caching is supported for all types of remote storage.
3. LakeSoul query partition filter pushdown performance has been significantly optimized. By using metadata index queries, pushdown of equal-value partition filter conditions has been significantly optimized. In actual tests, partition filtering on a single table with millions of partitions took only 50ms.
4. Flink upgraded to version 1.20.
5. LakeSoul natively supports the Spark + Gluten vectorization engine, significantly improving batch computing performance.
6. LakeSoul natively supports the Presto + Velox vectorization engine, providing high-performance MPP on-lake analytics and queries. The Presto engine has added RBAC permissions.
7. Arrow Flight SQL RPC Service: Provides a high-performance columnar data read and write gateway service based on the Arrow Flight protocol, supporting load balancing, elastic scaling, and RBAC permission verification.
8. Python packages are now available on PyPi, and the LakeSoul Python package can be directly installed via pip install lakesoul.
LakeSoul Lake Warehouse Maintenance Service
1. A new generation of size-tiered automatic background compaction service, significantly improving compaction performance and significantly reducing write amplification, thereby lowering compaction resource overhead.
2. A new generation of automatic asynchronous cleanup service: Asynchronously cleans redundant and expired data by consuming metadata change logs.
3. Asset Statistics Service: Automatically generates lake warehouse asset statistics by consuming metadata change logs, providing real-time statistics on storage resource consumption across multiple dimensions, including space, namespace, table, partition, and user.

LakeSoul 3.0.0 版本发布

经过近 1 年的迭代优化，LakeSoul 3.0.0 版本正式发布。本次发布带来以下重要更新：

LakeSoul 湖仓框架内核功能更新
1. LakeSoul NativeIO 性能再次大幅优化，包括调整写文件压缩和字典编码算法、优化 Merge on Read 关键代码路径等，实现读、写性能均提升一倍(对比 2.6 版本)。
2. LakeSoul NativeIO 新增本地热数据缓存功能。可以支持将远程对象存储文件缓存在本地磁盘，大幅提升 MPP 查询等性能。支持所有类型远程存储的本地缓存。
3. LakeSoul 查询分区过滤下推性能大幅优化，通过元数据索引查询方式，对等值分区过滤条件下推做了大幅度的性能优化。实测单表百万级分区，分区过滤仅需 50ms。
4. Flink 升级至 1.20 版本
5. LakeSoul 原生支持 Spark + Gluten 向量化引擎，实现批计算大幅性能提升
6. LakeSoul 原生支持 Presto + Velox 向量化引擎，提供高性能 MPP 湖上分析查询。Presto 引擎新增 RBAC 权限功能
7. Arrow Flight SQL RPC 服务：提供基于 Arrow Flight 协议的高性能列式数据读写网关服务，支持负载均衡、弹性伸缩，支持 RBAC 权限校验
8. Python 包推送至 PyPi ，支持通过 pip install lakesoul 直接安装 LakeSoul Python 包
LakeSoul 湖仓后台服务
1. 新一代分层 Size-tiered 自动后台 Compaction 服务，Compaction 性能显著提升并大幅减少写放大，降低 Compaction 资源开销
2. 新一代自动异步清理服务：通过消费元数据变更日志，实现异步化的自动冗余、过期数据清理
3. 资产统计服务：通过消费元数据变更日志，自动进行湖仓资产统计，提供空间、namespace、表、分区、用户等多个维度的存储资源消耗实时统计

Assets 6

0 Join discussion

05 Sep 09:44

xuchen-plus

py-v1.0.0

42085ee

Python v1.0.0

Release LakeSoul Python package 1.0.0 to PyPi.

Assets 2

0 Join discussion

07 Aug 07:32

xuchen-plus

v2.6.2

95a9444

v2.6.2

Full Changelog: v2.6.1...v2.6.2

Assets 2

0 Join discussion

22 Jul 05:23

xuchen-plus

v2.6.1

7972072

v2.6.1

Full Changelog: v2.6.0...v2.6.1

Assets 4

0 Join discussion

17 Jul 05:49

xuchen-plus

v2.6.0

fd62b92

v2.6.0

What's Changed

[Rust] Apply clippy and fix typos; by @mag1c1an1 in #404
[Docs] Add Spark Getting Started Guide by @Ceng23333 in #403
[Docs] Add Flink Getting Started Guide by @moresun in #405
[Docs] Modify Getting Started Env Guide by @F-PHantam in #406
[Docs] Update docs format by @xuchen-plus in #408
[Docs] Fix Docs Page Show Errors and Update LakeSoul Version by @F-PHantam in #409
[Docs]Fine check usage cases of spark guide by @Ceng23333 in #411
[Website] fix website zh-Hans Homepage docs link by @mag1c1an1 in #413
[Docs] add pyspark in spark-guide by @moresun in #414
[Spark/Rust/Test] Fix MergeOperatorSuite && Disable 3 cases by @Ceng23333 in #417
[Spark] Implement columnar write for compaction by @xuchen-plus in #415
[Spark] Add debug print for compaction tests by @xuchen-plus in #418
[Docs] Update docs to 2.5.1 by @xuchen-plus in #419
[Spark/Rust] Fix Unicode column name at native io by @Ceng23333 in #420
Support sqlserver CDC by @ChenYunHey in #421
[Docs] Fix python doc typos by @Ceng23333 in #425
[Spark/Rust] Support filter on nesting column name by @Ceng23333 in #422
[Docs] Add docs and recent blogs by @xuchen-plus in #423
[Flink/Rust] Adjust rolling file logic to reduce memory usage during write by @xuchen-plus in #426
[Rust] Enable metadata max retries by @Ceng23333 in #431
[Flink] Fix CDC entry db name by @ChenYunHey in #430
[Rust] Keep only push down rules in datafusion by @xuchen-plus in #432
[Rust] Datafusion Catalog Support by @mag1c1an1 in #429
[Flink] Fix non-primary key table's sink parallelism by @ChenYunHey in #433
[Python] Fix python host build by @xuchen-plus in #434
[Spark] Add sleep 1s for compaction tests by @xuchen-plus in #435
[Docs] Add deployment docs by @xuchen-plus in #437
[Rust] fix catalog unittest by @mag1c1an1 in #438
[Flink]Fix create table options by @ChenYunHey in #436
[Rust] fix panic by @mag1c1an1 in #440
[Rust/CI] Add consistency-ci by @Ceng23333 in #441
[Spark] Compaction bugfix by @Ceng23333 in #442
[Rust]Fix Consistency CI by @Ceng23333 in #443
[Flink] Shade guava for flink package by @xuchen-plus in #444
Bump org.postgresql:postgresql from 42.5.1 to 42.5.5 in /lakesoul-common by @dependabot in #445
[Project] Bump version by @xuchen-plus in #446
[Spark] Fix spark rbac test by @xuchen-plus in #447
[Rust]DataFusion connector supports partition column by @Ceng23333 in #449
[Rust] add create split logic in rust by @mag1c1an1 in #448
[Rust/BugFix]fix escape path error by @Ceng23333 in #450
[Rust] (metadata) move metadataclient to rawclient by @mag1c1an1 in #451
[NativeIO] Shade packages into lakesoul-io-java by @moresun in #453
Bump mio from 0.8.10 to 0.8.11 in /rust by @dependabot in #456
[Project] Add shaded jar for common and io by @xuchen-plus in #455
[Project] Refine shade pacakges by @xuchen-plus in #457
[Flink] Fix LakeSoul table export with timestamp local timezone type by @ChenYunHey in #427
[Flink]fix readPartitionInfo on UpdateCommit by @Ceng23333 in #458
[Project] Adjust pom and version by @xuchen-plus in #462
[Flink]support Delete statement on partition column by @Ceng23333 in #459
[Project] Fix pom flattern issue by @xuchen-plus in #467
[Flink] Support MongoDB CDC Import/Export by @ChenYunHey in #460
[Rust] add substrait for flink and be compatible for other engines by @mag1c1an1 in #454
[Rust/Flink]Flink repartition pushdown by @Ceng23333 in #463
[Flink]update global committer for bounded case by @Ceng23333 in #469
[Flink]support flink watermark and computed column by @Ceng23333 in #472
[Flink] Fix time type for flink. fix hdfs dir permission in cdc sync by @xuchen-plus in #477
[Flink] Verify primary keys and partition keys during create table by @xuchen-plus in #475
[Flink]fix flink update statement for non pk table by @Ceng23333 in #478
[Flink] Add dependencies to shaded jar for flink by @xuchen-plus in #479
Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /website by @dependabot in #482
Bump whoami from 1.4.1 to 1.5.0 in /rust by @dependabot in #480
Bump rustls from 0.21.10 to 0.21.12 in /rust by @dependabot in #481
Bump tar from 6.2.0 to 6.2.1 in /website by @dependabot in #483
Bump express from 4.18.2 to 4.19.2 in /website by @dependabot in #484
Bump follow-redirects from 1.15.5 to 1.15.6 in /website by @dependabot in #485
Bump h2 from 0.3.22 to 0.3.26 in /rust by @dependabot in #486
[Flink] Throw exception when create table dir failed by @xuchen-plus in #487
[Flink]Support dynamic partition pushdown for streaming source by @Ceng23333 in #489
[NativeIO]support chrono partition column by @Ceng23333 in #490
[Flink]Add Flink DataStream Sink for Arrow RecordBatch by @Ceng23333 in #491
[Flink] Fix sql submit main entry by @xuchen-plus in #494
[Flink] Fix flink package by @xuchen-plus in #495
[Flink] add arrow datastream source by @Ceng23333 in #496
[Spark] Fix compaction for cdc table by @xuchen-plus in #498
Bump braces from 3.0.2 to 3.0.3 in /website by @dependabot in #497
[Fix]fix LakeSoulArrowSource serialization by @Ceng23333 in #499
[Native] Use mimalloc in native libs by @xuchen-plus in #500
Revert "[Native] Use mimalloc in native libs" by @xuchen-plus in #501
[Flink] Fix flink select only partition column by @xuchen-plus in #502
[Python/Native] Support Python read pk table by @xuchen-plus in #503
[Flink] Fix hdfs ns dir permission by @xuchen-plus in #504
[Fix/Spark] Fix spark type compatibility with flink's time and timestamp types by @Ceng23333 in #505
[Native] Update hdfs-sys to use 3.3 libhdfs version by @xuchen-plus in #506
[NativeIO]Doris filter support by @Ceng23333 in #507
[Docs] Update version in docs by @xuchen-plus in #508