-
Notifications
You must be signed in to change notification settings - Fork 7.9k
ext/mbstring: Update to Unicode 16 #15898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks very much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw changes of Unicode 16.0, Looks good to me.
Thank you very much.
Updates UCD to Unicode 16.0 (released 2024 Sept). Previously: 0fdffc1, php#7502, php#14680 Unicode 16 adds several new character sets and case folding rules. However, the existing ucgendat script can still parse them. This also adds a couple test cases to make sure the new rules for East Asian Wide characters and case folding work correctly. These tests fail on Unicode 15.1 and older because those verisons do not contain those rules.
Thank you for approving this @alexdowad @youkidearitai. |
@youkidearitai Shall I merge? |
@alexdowad Yes, please! |
@alexdowad Did you find any problems? If nothing, shall I merge instead of you? |
Very sorry, I just got occupied with other things and didn't complete this task. |
CI failure for WINDOWS_X64_ZTS is spurious. |
Just to make very sure everything is OK, I downloaded the UCD files for Unicode 16.0.0 and re-ran |
This is really odd... when I fetch this comment and merge it into |
Thank you @alexdowad. You are right, the changes are merely after running the script. We had some issues in the script for Unicode 15.1, but 16.0 had no problems. |
Thanks very much, @Ayesh... this is now landed on |
Thank you @youkidearitai @alexdowad 🙏. |
Updates UCD to Unicode 16.0 (released 2024 Sept).
Previously: 0fdffc1, #7502, #14680
Unicode 16 adds several new character sets and case folding rules. However, the existing ucgendat script can still parse them.
This also adds a couple test cases to make sure the new rules for East Asian Wide characters and case folding work correctly. These tests fail on Unicode 15.1 and older because those verisons do not contain those rules.