Skip to content

do simple literal prefix scanning in regex! #95

Closed
@mkpankov

Description

@mkpankov

Hi,

I saw the news of that regex got refactored and optimized and decided to check my old benchmark. I was very surprised it now runs twice as long!

How to reproduce (using multirust for versions as older regex doesn't compile with newer nightly Rust):

git clone https://github.com/mkpankov/parse-rust.git
cd parse-rust
multirust override nightly-2015-06-24
git checkout 4076c404caf1560a466e9f0799817035089fe841
cargo build --release
time zcat mp3-logs-with-fake-ips.log.gz | ./target/release/parse-rust
// outputs around 4s on my machine
multirust override nightly-2015-05-25
git checkout e33d410291fa7f134eef628b5591d605cd68b218
cargo clean
cargo build --release
time zcat mp3-logs-with-fake-ips.log.gz | ./target/release/parse-rust
// outputs around 2s on my machine

I'm sorry I can't pinpoint it more accurately (maybe it's Rust changes, not regex), but recent major changes of regex might be it. Two times degradation is severe in my opinion, and needs action.

regex versions:

  • new, degraded:
 "regex 0.1.38 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex_macros 0.1.20 (registry+https://github.com/rust-lang/crates.io-index)",
  • old, fast:
 "regex 0.1.30 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex_macros 0.1.18 (registry+https://github.com/rust-lang/crates.io-index)",

Some background: back when I did this I compared Rust version to C++ version (doing almost stupid translation) and Rust beat C++ by about 40% w/o using compile-time regex. This kind of degradation puts it back behind C++ 😞

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions