Skip to content

Megular Expressions #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 5, 2025
Merged

Megular Expressions #263

merged 14 commits into from
Jun 5, 2025

Conversation

MarcoPolo
Copy link
Contributor

@MarcoPolo MarcoPolo commented Jan 16, 2025

A very simple regular expression matcher for Multiaddr components. Supports capturing values. Matches in linear time (no back tracking).

The core logic is about 100 LOC. The sugar to make it nicer to use is about another 100 LOC.

Motivation

If we are going to treat Multiaddrs as encoding, then we need to make it more ergonomic to parse multiaddrs. Right now we have a lot of somewhat wrong manual parsers using ForEach. This should be able to replace those, make it cleaner, and most importantly make them obviously correct.

In draft while I try this out in go-libp2p.

Example

Parsing a WebTransport Multiaddr m.

var dnsName string
var ip4Addr string
var ip6Addr string
var udpPort string
var certHashesStr []string
matched, err := m.Match(
  meg.Or(
    meg.CaptureVal(ma.P_IP4, &ip4Addr),
    meg.CaptureVal(ma.P_IP6, &ip6Addr),
    meg.CaptureVal(ma.P_DNS4, &dnsName),
    meg.CaptureVal(ma.P_DNS6, &dnsName),
    meg.CaptureVal(ma.P_DNS, &dnsName),
  ),
  meg.CaptureVal(ma.P_UDP, &udpPort),
  meg.Val(ma.P_QUIC_V1),
  meg.Optional(
    meg.CaptureVal(ma.P_SNI, &wtAddr.sni),
  ),
  meg.Val(ma.P_WEBTRANSPORT),
  meg.CaptureZeroOrMore(ma.P_CERTHASH, &certHashesStr),
)
if err != nil {
  return webtransportAddr{}, err
}
if !matched {
  return webtransportAddr{}, errNotQUICAddr
}

@MarcoPolo MarcoPolo requested a review from sukunrt January 16, 2025 21:57
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch 2 times, most recently from 3fbfcdd to 126f9ef Compare January 20, 2025 18:13
@MarcoPolo MarcoPolo marked this pull request as ready for review January 21, 2025 20:04
Copy link
Member

@sukunrt sukunrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very useful to me modulo comments. See how nice the IsWebRTCDirectMultiaddr method is now vs compared to master.

I think we should make this package Experimental / Alpha and start using this in go-libp2p. After some experience we can remove the Experimental tag, till then we can keep iterating on a satisfactory api.

@MarcoPolo MarcoPolo force-pushed the marco/multiaddr-refactor branch from 493f175 to 47c55fc Compare February 6, 2025 19:36
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from 126f9ef to 6bbe24b Compare February 6, 2025 19:47
@sukunrt sukunrt changed the base branch from marco/multiaddr-refactor to master February 13, 2025 10:40
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from 2a8b8af to be1c5ad Compare February 20, 2025 01:21
Copy link
Member

@sukunrt sukunrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one unaddresses comment:
#263 (comment)

MarcoPolo and others added 7 commits February 25, 2025 16:17
Support captures

export some things

wip thinking about public API

Think about exposing meg as a public API

doc comments

Finish rename

Add helper for meg and add test

add comment for devs
twice as fast without the copy
* much cheaper copies of captures

* Add a benchmark

* allocate to a slice. Use indexes as handles

* cleanup

* Add nocapture loop benchmark

It's really fast. No surprise

* cleanup

* nits
* Use Matchable interface

* Add Bytes to Matchable interface

* feat(x/meg): Support capturing bytes

* Export CaptureWithF

Can be used by more specific capturers (e.g capture net.AddrIP)

* Support Any match, RawValue, and multiple Concatenations

* Add CaptureAddrPort
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from c14016b to ae47e22 Compare February 26, 2025 00:18
@p-shahi p-shahi mentioned this pull request Feb 26, 2025
9 tasks
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from ad1932c to 0c5383d Compare February 26, 2025 02:44
"github.com/multiformats/go-multiaddr/x/meg"
)

func CaptureAddrPort(network *string, ipPort *netip.AddrPort) (capturePattern meg.Pattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?
I'd prefer a method in manet that returns (network string, ipPort netip.AddrPort)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This composes better. You can use this and capture other parts of the multiaddr in a single pass.

We don't need this, but I think it's helpful to demonstrate this pattern.

@sukunrt
Copy link
Member

sukunrt commented May 15, 2025

This looks good except for this comment: #263 (comment)

I'll approve once we resolve that.

MarcoPolo added 3 commits June 3, 2025 16:05
The state machine handling was modified to deprioritize the `matchAny`
path
when alternatives exist, resulting in less greedy matching behavior when
using Any patterns.
@MarcoPolo MarcoPolo merged commit 5426748 into master Jun 5, 2025
12 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants