Skip to content

Matching beginning of line does not work as expected with repetition #229

@schneems

Description

@schneems

Reproduction

Code:

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'parslet', '2.0.0'
  gem 'minitest', '5.25.4'
end
require 'minitest'
require "parslet"
require "parslet/convenience"

module Problem
  class PegParser < Parslet::Parser
    rule(:code_fence) {
      match(/\A`/) >> str("``")
    }

    rule(:anything_but_code_fence) {
      (code_fence.absent? >> any).repeat(1, nil)
    }
  end
end

class ProblemDemo < Minitest::Test
  def test_parses_code_fence_as_expected
    parsed = Problem::PegParser.new.code_fence.parse_with_debug("```")
    assert_equal("```", parsed)
  end

  def test_slash_a_works_as_expected
    assert_raises(Parslet::ParseFailed) {
      _ = Problem::PegParser.new.code_fence.parse("NOT A CODE FENCE ```")
    }
  end

  def test_slash_a_any_repeat_does_not_work_as_expected
    input = "NOT A CODE FENCE ```"
    parsed = Problem::PegParser.new.anything_but_code_fence.parse_with_debug(input)
    assert_equal(input, parsed)
  end
end

Run:

$ gem install m
$ m parselet_problem_test.rb
Run options: -n "/^(test_parses_code_fence_as_expected|test_slash_a_works_as_expected|test_slash_a_any_repeat_does_not_work_as_expected)$/" --seed 16544

# Running:

Extra input after last repetition at line 1 char 18.
`- Failed to match sequence (!CODE_FENCE .) at line 1 char 18.
   `- Input should not start with CODE_FENCE at line 1 char 18.
F..

Finished in 0.000818s, 3667.4819 runs/s, 3667.4819 assertions/s.

  1) Failure:
ProblemDemo#test_slash_a_any_repeat_does_not_work_as_expected [parselet_problem_test.rb:40]:
Expected: "NOT A CODE FENCE ```"
  Actual: nil

3 runs, 3 assertions, 1 failures, 0 errors, 0 skips

Expected

That I can write a parser to capture a pattern that requires it begin at the start of a line, and I can re-use that same parser to capture anything EXCEPT for that exact match via absent? and repeat. I expect the tests to pass

Actual

The last test above fails:

Extra input after last repetition at line 1 char 18.
`- Failed to match sequence (!CODE_FENCE .) at line 1 char 18.
   `- Input should not start with CODE_FENCE at line 1 char 18.
F

Finished in 0.040330s, 570.2951 runs/s, 1785.2715 assertions/s.

  1) Failure:
ProblemDemo#test_slash_a_any_does_not_work_as_expected [test/rundoc/peg_parser_test.rb:33]:
Expected: "NOT A CODE FENCE ```"
  Actual: nil

This happens because each iteration of the repeat consumes one character via any so "NOT A CODE FENCE ```" gets paired down to "OT A CODE FENCE ```" and the parser continues. However when it reaches "```" it incorrectly thinks that the backticks start at the beginning of a line, when they do not in the original document.

Considerations

I'm unsure if this is actually a bug or not (i.e. it's unexpected to me, but perhaps it's by design). I'm wondering if this is expected, if there's a workaround or pattern I can use to capture all values up to a parser that starts on a new line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions