Skip to content

gh-123963: Expose GetCurrentByteCount from expat #123964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Doc/library/pyexpat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,15 @@ just past the last parse event (regardless of whether there was an associated
callback).


.. attribute:: xmlparser.CurrentByteCount

Number of bytes in the current event. ``0`` if the event is for the end tag
event for *empty-element* tags or is inside a reference to an internal
entity.

.. versionadded:: 3.14


.. attribute:: xmlparser.CurrentByteIndex

Current byte index in the parser input.
Expand Down
5 changes: 3 additions & 2 deletions Lib/test/test_pyexpat.py
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ def EndElementHandler(self, name):
def check_pos(self, event):
pos = (event,
self.parser.CurrentByteIndex,
self.parser.CurrentByteCount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a dedicated test as well for this one? The libexpact docs say:

Returns 0 if the event is inside a reference to an internal entity and for the end-tag event for empty element tags (the later can be used to distinguish empty-element tags from empty elements using separate start and end tags

Copy link
Author

@DelusionalLogic DelusionalLogic Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case already includes an example of an empty-element as defined in the XML spec1. What is missing is an example of an empty element in the long form (with a separate open and close tag), but that seems redundant.

I'll gladly add it if you'd like me to explicitly include it.

self.parser.CurrentLineNumber,
self.parser.CurrentColumnNumber)
self.assertTrue(self.upto < len(self.expected_list),
Expand All @@ -520,8 +521,8 @@ def test(self):
self.parser.StartElementHandler = self.StartElementHandler
self.parser.EndElementHandler = self.EndElementHandler
self.upto = 0
self.expected_list = [('s', 0, 1, 0), ('s', 5, 2, 1), ('s', 11, 3, 2),
('e', 15, 3, 6), ('e', 17, 4, 1), ('e', 22, 5, 0)]
self.expected_list = [('s', 0, 3, 1, 0), ('s', 5, 3, 2, 1), ('s', 11, 4, 3, 2),
('e', 15, 0, 3, 6), ('e', 17, 4, 4, 1), ('e', 22, 4, 5, 0)]

xml = b'<a>\n <b>\n <c/>\n </b>\n</a>'
self.parser.Parse(xml, True)
Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -882,6 +882,7 @@ Muhammad Jehanzeb
Drew Jenkins
Flemming Kjær Jensen
Philip H. Jensen
Jesper Jensen
Philip Jenvey
MunSic Jeong
Chris Jerdonek
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Expose the :attr:`xmlparser.CurrentByteCount` field for :mod:`Expat XML
<xml.parsers.expat>` parsers.
Patch by Jesper Jensen.
2 changes: 2 additions & 0 deletions Modules/pyexpat.c
Original file line number Diff line number Diff line change
Expand Up @@ -1349,6 +1349,7 @@ INT_GETTER(ErrorByteIndex)
INT_GETTER(CurrentLineNumber)
INT_GETTER(CurrentColumnNumber)
INT_GETTER(CurrentByteIndex)
INT_GETTER(CurrentByteCount)

#undef INT_GETTER

Expand Down Expand Up @@ -1529,6 +1530,7 @@ static PyGetSetDef xmlparse_getsetlist[] = {
XMLPARSE_GETTER_DEF(CurrentLineNumber)
XMLPARSE_GETTER_DEF(CurrentColumnNumber)
XMLPARSE_GETTER_DEF(CurrentByteIndex)
XMLPARSE_GETTER_DEF(CurrentByteCount)
XMLPARSE_GETTER_SETTER_DEF(buffer_size)
XMLPARSE_GETTER_SETTER_DEF(buffer_text)
XMLPARSE_GETTER_DEF(buffer_used)
Expand Down
Loading