Skip to content

New "pattern can match invalid UTF-8" error in 1.8.0 #989

Closed as not planned
@jplatte

Description

@jplatte

What version of regex are you using?

1.8.1

Describe the bug at a high level.

In regex <1.8.0, it was possible to create a regex containing \W inside of a (?-u:) group. This no longer works and makes regex construction fail instead. Looks like we should have been using regex::bytes::Regex the whole time, but it hasn't been a problem so far and we were only searching within strs anyways.

What are the steps to reproduce the behavior?

[package]
name = "regex-bug"
version = "0.0.1"
edition = "2021"

[dependencies]
regex = "1.8"
use regex::Regex;

fn main() {
    Regex::new(r"(?-u:\W)").unwrap();
}

What is the actual behavior?

Crashes.

What is the expected behavior?

Doesn't crash. (not because that's more sensible, but because it has been possible for a very long time and people might be relying on it accidentally, like we did in Ruma)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions