Skip to content

ext/mbstring: Update to Unicode 16 #15898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Ayesh
Copy link
Member

@Ayesh Ayesh commented Sep 15, 2024

Updates UCD to Unicode 16.0 (released 2024 Sept).

Previously: 0fdffc1, #7502, #14680

Unicode 16 adds several new character sets and case folding rules. However, the existing ucgendat script can still parse them.

This also adds a couple test cases to make sure the new rules for East Asian Wide characters and case folding work correctly. These tests fail on Unicode 15.1 and older because those verisons do not contain those rules.

@alexdowad
Copy link
Contributor

Thanks very much!
Any comment from @youkidearitai?

Copy link
Contributor

@youkidearitai youkidearitai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw changes of Unicode 16.0, Looks good to me.
Thank you very much.

Updates UCD to Unicode 16.0 (released 2024 Sept).

Previously: 0fdffc1, php#7502, php#14680

Unicode 16 adds several new character sets and case folding rules.
However, the existing ucgendat script can still parse them.

This also adds a couple test cases to make sure the new rules for
East Asian Wide characters and case folding work correctly. These
tests fail on Unicode 15.1 and older because those verisons do not
contain those rules.
@Ayesh
Copy link
Member Author

Ayesh commented Sep 16, 2024

Thank you for approving this @alexdowad @youkidearitai.
I adjusted our notes in the UPGRADING and NEWS files as well.

@alexdowad
Copy link
Contributor

@youkidearitai Shall I merge?

@youkidearitai
Copy link
Contributor

@alexdowad Yes, please!

@youkidearitai
Copy link
Contributor

@alexdowad Did you find any problems? If nothing, shall I merge instead of you?

@alexdowad
Copy link
Contributor

Very sorry, I just got occupied with other things and didn't complete this task.

@alexdowad
Copy link
Contributor

CI failure for WINDOWS_X64_ZTS is spurious.

@alexdowad
Copy link
Contributor

Just to make very sure everything is OK, I downloaded the UCD files for Unicode 16.0.0 and re-ran ucgendat.php. Same results as this PR.

@alexdowad
Copy link
Contributor

This is really odd... when I fetch this comment and merge it into master locally, I don't see the added entry in NEWS. 😕 Fixing that up manually...

@Ayesh
Copy link
Member Author

Ayesh commented Sep 17, 2024

Thank you @alexdowad. You are right, the changes are merely after running the script. We had some issues in the script for Unicode 15.1, but 16.0 had no problems.

@alexdowad
Copy link
Contributor

Thanks very much, @Ayesh... this is now landed on master.

@alexdowad alexdowad closed this Sep 17, 2024
@Ayesh
Copy link
Member Author

Ayesh commented Sep 17, 2024

Thank you @youkidearitai @alexdowad 🙏.

@Ayesh Ayesh deleted the unicode-16 branch September 17, 2024 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants