[Flink] Support withShard read of Flink #5650

wwj6591812 · 2025-05-22T10:40:22Z

Purpose

In my company, we use Flink session cluster read paimon for OLAP.

When the QPS of search is high, many problems are gradually becoming apparent.
1、The JobManager's IO is high. Because all job's plan will read meta on FileSystem.
2、To accelerate queries, when we scale-up scan.manifest.parallelism, the memory of JobManager is high.
3、The query speed become slow. Because when QPS is high, the JobManager is busy, this will reduce the speed of the plan or other logic.
4、When QPS is high, the memory is heavy, even though we DropStats after plan.

We found the reason is all job's plan is run on JobManager. In order for the system can accept higher QPS and the query speed can be faster, we we want add an option that user can move the plan to TaskManager.

Be more detailed, when open the option, ShardStaticFileStoreSource will create as many splits as source parallelism, this splits will be assigned to reader. In ShardSourceReader#addSplits we use TableScan withShard feature to get the splits that belong to this reader.

When DynamicPartitionPruning enabled, we add the DynamicPartitionPruning info into MockSplit, after reader receive it, reader will filter partitions after plan withShard.

Tests

1、UT cases
org.apache.paimon.flink.source.shardread.ShardReadAssignModeTest
org.apache.paimon.flink.source.shardread.ShardSourceReaderTest

2、IT cases
org.apache.paimon.flink.source.shardread.ShardReadITCase

3、Job Test
We run some jobs to test data integrity. The result is no problem.
(1) Without ShardRead

(2)ShardRead Without Failover Without Speculation Execution.

(3) ShardRead Without Failover with Speculation Execution.

(4)ShardRead With Failover Without Speculation Execution.

(5) ShardRead With Failover With Speculation Execution.

4、Our production environment benefits
Our Flink session OLAP cluster, the JM Core is 16, Memory is 64G, we test submit 5 jobs(select * from paimon table;) .
(1) Without this pr, when do paimon plan, the memory of jm is heavy, and cpu/io is busy. Some job's plan cost 2-3 minutes.

(2) With this pr, we move plan from jm to tm, the jm's memory and cpu is not heavy, the flink session cluster can endure more greater QPS.

So in our production environment, we need this feature.

API and Format

Documentation

wwj6591812 · 2025-05-27T01:24:25Z

@JingsongLi Hi, please CC, Thx.

JingsongLi · 2025-06-05T11:05:20Z

Can we consider using DataStreamSource to place the Plan in TM instead of JM.

wwj6591812 · 2025-06-08T05:52:48Z

Can we consider using DataStreamSource to place the Plan in TM instead of JM.
Thx, @JingsongLi , Good idea, please CC:
#5715

wwj6591812 · 2025-06-13T01:17:42Z

Close this, because we use other method, see #5715

wwj6591812 force-pushed the support_with_shard_read_of_flink_0522 branch 2 times, most recently from 1700b4e to 131ca92 Compare May 22, 2025 12:34

wwj6591812 changed the title ~~[WIP][Flink] Support withShard read of Flink~~ [Flink] Support withShard read of Flink May 22, 2025

wwj6591812 force-pushed the support_with_shard_read_of_flink_0522 branch 4 times, most recently from 262683a to cc658ff Compare May 23, 2025 09:10

wwj6591812 changed the title ~~[Flink] Support withShard read of Flink~~ [WIP][Flink] Support withShard read of Flink May 23, 2025

wwj6591812 force-pushed the support_with_shard_read_of_flink_0522 branch 2 times, most recently from 64790a0 to 2abbd10 Compare May 24, 2025 15:24

wwj6591812 changed the title ~~[WIP][Flink] Support withShard read of Flink~~ [Flink] Support withShard read of Flink May 24, 2025

wwj6591812 force-pushed the support_with_shard_read_of_flink_0522 branch from 2abbd10 to 42e3cce Compare May 25, 2025 15:46

wwj6591812 force-pushed the support_with_shard_read_of_flink_0522 branch from 42e3cce to 3696243 Compare May 29, 2025 10:16

[Flink] Support withShard read of Flink

aa5c8d3

wwj6591812 force-pushed the support_with_shard_read_of_flink_0522 branch from 3696243 to aa5c8d3 Compare May 29, 2025 10:25

wwj6591812 mentioned this pull request Jun 8, 2025

[Flink] Add batch read operator source. #5715

Merged

wwj6591812 closed this Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flink] Support withShard read of Flink #5650

[Flink] Support withShard read of Flink #5650

Uh oh!

wwj6591812 commented May 22, 2025 •

edited

Loading

Uh oh!

wwj6591812 commented May 27, 2025

Uh oh!

JingsongLi commented Jun 5, 2025

Uh oh!

wwj6591812 commented Jun 8, 2025

Uh oh!

wwj6591812 commented Jun 13, 2025

Uh oh!

Uh oh!

[Flink] Support withShard read of Flink #5650

[Flink] Support withShard read of Flink #5650

Uh oh!

Conversation

wwj6591812 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

wwj6591812 commented May 27, 2025

Uh oh!

JingsongLi commented Jun 5, 2025

Uh oh!

wwj6591812 commented Jun 8, 2025

Uh oh!

wwj6591812 commented Jun 13, 2025

Uh oh!

Uh oh!

wwj6591812 commented May 22, 2025 •

edited

Loading