Skip to content

Commit a26536d

Browse files
authored
fix: apply upstream patch for in-context parsing (#3116)
**What problem is this PR intended to solve?** Apply upstream fix for #3112 - upstream bug report https://gitlab.gnome.org/GNOME/libxml2/-/issues/672 - upstream fix https://gitlab.gnome.org/GNOME/libxml2/-/commit/95f2a17440568694a6df6a326c5b411e77597be2 Fixes #3112 **Have you included adequate test coverage?** Yes **Does this change affect the behavior of either the C or the Java implementations?** Only affects the C implementation.
2 parents 08810c7 + da300b4 commit a26536d

File tree

3 files changed

+50
-0
lines changed

3 files changed

+50
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
1515

1616
* [CRuby] `XML::Reader` defaults the encoding to UTF-8 if it's not specified in either the document or as a method parameter. Previously non-ASCII characters were serialized as NCRs in this case. [#2891] (@flavorjones)
1717
* [CRuby] Restored support for compilation by GCC versions earlier than 4.6, which was broken in v1.15.0 (540e9aee). [#3090] (@adfoster-r7)
18+
* [CRuby] Patched upstream libxml2 to allow parsing HTML5 in the context of a namespaced node (e.g., foreign content like MathML). [#3112, #3116] (@flavorjones)
1819

1920

2021
## v1.16.0 / 2023-12-27
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
From 95f2a17440568694a6df6a326c5b411e77597be2 Mon Sep 17 00:00:00 2001
2+
From: Nick Wellnhofer <[email protected]>
3+
Date: Tue, 30 Jan 2024 13:25:17 +0100
4+
Subject: [PATCH] parser: Fix crash in xmlParseInNodeContext with HTML
5+
documents
6+
7+
Ignore namespaces if we have an HTML document with namespaces added
8+
manually.
9+
10+
Fixes #672.
11+
---
12+
parser.c | 4 +++-
13+
1 file changed, 3 insertions(+), 1 deletion(-)
14+
15+
diff --git a/parser.c b/parser.c
16+
index 1038d71b..f7842ed1 100644
17+
--- a/parser.c
18+
+++ b/parser.c
19+
@@ -12415,8 +12415,10 @@ xmlParseInNodeContext(xmlNodePtr node, const char *data, int datalen,
20+
}
21+
xmlAddChild(node, fake);
22+
23+
- if (node->type == XML_ELEMENT_NODE) {
24+
+ if (node->type == XML_ELEMENT_NODE)
25+
nodePush(ctxt, node);
26+
+
27+
+ if ((ctxt->html == 0) && (node->type == XML_ELEMENT_NODE)) {
28+
/*
29+
* initialize the SAX2 namespaces stack
30+
*/
31+
--
32+
2.42.0
33+

test/html5/test_api.rb

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,22 @@ def test_node_wrap_uses_parent_node_as_parsing_context_node
238238
assert_equal("select", el.parent.parent.name)
239239
end
240240

241+
def test_parse_in_context_of_foreign_namespace
242+
if Nokogiri.uses_libxml?("~> 2.12.0")
243+
skip_unless_libxml2_patch("0012-parser-Fix-crash-in-xmlParseInNodeContext-with-HTML.patch")
244+
end
245+
246+
# https://github.com/sparklemotion/nokogiri/issues/3112
247+
# https://gitlab.gnome.org/GNOME/libxml2/-/issues/672
248+
doc = Nokogiri::HTML5::Document.parse("<html><body><math>")
249+
math = doc.at_css("math")
250+
251+
nodes = math.parse("mrow") # segfaults in libxml 2.12 before 95f2a174
252+
253+
assert_kind_of(Nokogiri::XML::NodeSet, nodes)
254+
assert_equal(1, nodes.length)
255+
end
256+
241257
describe Nokogiri::HTML5::Document do
242258
describe "#fragment" do
243259
it "parses text nodes in a `body` context" do

0 commit comments

Comments
 (0)