Skip to content

Commit eedd861

Browse files
committed
Reduce calls to StringScanner.new()
[Why] StringScanner.new() instances can be reused within parse_attributes, reducing initialization costs. ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 11.025 11.202 16.207 17.315 i/s - 100.000 times in 9.069926s 8.926851s 6.170348s 5.775288s sax 30.084 30.519 45.220 47.814 i/s - 100.000 times in 3.324024s 3.276648s 2.211399s 2.091429s pull 34.782 35.849 53.867 56.851 i/s - 100.000 times in 2.875069s 2.789495s 1.856439s 1.758998s stream 32.546 33.541 46.362 47.775 i/s - 100.000 times in 3.072603s 2.981465s 2.156952s 2.093130s Comparison: dom after(YJIT): 17.3 i/s before(YJIT): 16.2 i/s - 1.07x slower after: 11.2 i/s - 1.55x slower before: 11.0 i/s - 1.57x slower sax after(YJIT): 47.8 i/s before(YJIT): 45.2 i/s - 1.06x slower after: 30.5 i/s - 1.57x slower before: 30.1 i/s - 1.59x slower pull after(YJIT): 56.9 i/s before(YJIT): 53.9 i/s - 1.06x slower after: 35.8 i/s - 1.59x slower before: 34.8 i/s - 1.63x slower stream after(YJIT): 47.8 i/s before(YJIT): 46.4 i/s - 1.03x slower after: 33.5 i/s - 1.42x slower before: 32.5 i/s - 1.47x slower ``` - YJIT=ON : 1.03x - 1.07x faster - YJIT=OFF : 1.01x - 1.03x faster
1 parent 7712855 commit eedd861

File tree

1 file changed

+22
-21
lines changed

1 file changed

+22
-21
lines changed

lib/rexml/parsers/baseparser.rb

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ class BaseParser
115115
def initialize( source )
116116
self.stream = source
117117
@listeners = []
118+
@attributes_scanner = StringScanner.new('')
118119
end
119120

120121
def add_listener( listener )
@@ -601,36 +602,36 @@ def parse_attributes(prefixes, curr_ns)
601602
return attributes, closed if raw_attributes.nil?
602603
return attributes, closed if raw_attributes.empty?
603604

604-
scanner = StringScanner.new(raw_attributes)
605-
until scanner.eos?
606-
if scanner.scan(/\s+/)
607-
break if scanner.eos?
605+
@attributes_scanner.string = raw_attributes
606+
until @attributes_scanner.eos?
607+
if @attributes_scanner.scan(/\s+/)
608+
break if @attributes_scanner.eos?
608609
end
609610

610-
pos = scanner.pos
611+
pos = @attributes_scanner.pos
611612
loop do
612-
break if scanner.scan(ATTRIBUTE_PATTERN)
613-
unless scanner.scan(QNAME)
614-
message = "Invalid attribute name: <#{scanner.rest}>"
613+
break if @attributes_scanner.scan(ATTRIBUTE_PATTERN)
614+
unless @attributes_scanner.scan(QNAME)
615+
message = "Invalid attribute name: <#{@attributes_scanner.rest}>"
615616
raise REXML::ParseException.new(message, @source)
616617
end
617-
name = scanner[0]
618-
unless scanner.scan(/\s*=\s*/um)
618+
name = @attributes_scanner[0]
619+
unless @attributes_scanner.scan(/\s*=\s*/um)
619620
message = "Missing attribute equal: <#{name}>"
620621
raise REXML::ParseException.new(message, @source)
621622
end
622-
quote = scanner.scan(/['"]/)
623+
quote = @attributes_scanner.scan(/['"]/)
623624
unless quote
624625
message = "Missing attribute value start quote: <#{name}>"
625626
raise REXML::ParseException.new(message, @source)
626627
end
627-
unless scanner.scan(/.*#{Regexp.escape(quote)}/um)
628+
unless @attributes_scanner.scan(/.*#{Regexp.escape(quote)}/um)
628629
match_data = @source.match(/^(.*?)(\/)?>/um, true)
629630
if match_data
630-
scanner << "/" if closed
631-
scanner << ">"
632-
scanner << match_data[1]
633-
scanner.pos = pos
631+
@attributes_scanner << "/" if closed
632+
@attributes_scanner << ">"
633+
@attributes_scanner << match_data[1]
634+
@attributes_scanner.pos = pos
634635
closed = !match_data[2].nil?
635636
next
636637
end
@@ -639,11 +640,11 @@ def parse_attributes(prefixes, curr_ns)
639640
raise REXML::ParseException.new(message, @source)
640641
end
641642
end
642-
name = scanner[1]
643-
prefix = scanner[2]
644-
local_part = scanner[3]
645-
# quote = scanner[4]
646-
value = scanner[5]
643+
name = @attributes_scanner[1]
644+
prefix = @attributes_scanner[2]
645+
local_part = @attributes_scanner[3]
646+
# quote = @attributes_scanner[4]
647+
value = @attributes_scanner[5]
647648
if prefix == "xmlns"
648649
if local_part == "xml"
649650
if value != "http://www.w3.org/XML/1998/namespace"

0 commit comments

Comments
 (0)