Skip to content

Conversation

@emilie-wang
Copy link
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

Addresses #102

Brief change log

  • Update how to get the number of records from Delta source
  • Side effect, it also fixed the edge case when columnStats list is empty, the number of records becomes 0

Verify this pull request

This change added tests and can be verified as follows:

  • added a line of test

}
}

public long getNumRecords(AddFile addFile) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of parsing the stats twice, is there a way we can include this with the column stats output of getColumnStatsForFile? Maybe we can have some wrapper that wraps the count with the formatted stats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I created a wrapper called FileStats, which contains List<ColumnStat> and numRecords for per file.

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Can you squash down to a single commit for me? After that, I will merge the PR.

The change improved how to get the number of records from Delta source and fixed the edge case when columnStats list is empty, the number of records is 0.
A wrapper called FileStats is added to wrap the list of ColumnStat and num of records per file.
@emilie-wang emilie-wang force-pushed the fix_delta_stats_count branch from 06f9578 to cadc7d8 Compare January 31, 2025 18:16
@emilie-wang
Copy link
Contributor Author

@the-other-tim-brown Thank you for the review and I have squashed down to 1 commit.

@the-other-tim-brown the-other-tim-brown merged commit d3bbe5f into apache:main Jan 31, 2025
2 checks passed
@emilie-wang emilie-wang deleted the fix_delta_stats_count branch February 11, 2025 19:40
@vinishjail97 vinishjail97 mentioned this pull request Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants