Skip to content

gh-104306: Fix incorrect comment handling in the netrc module, minor refactor #104511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 31 additions & 32 deletions Lib/netrc.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

# Module and documentation by Eric S. Raymond, 21 Dec 1998

import os, stat
import os
import stat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's undo formatting to make sure only relevant changes are visible in the diff.

Suggested change
import os
import stat
import os, stat


__all__ = ["netrc", "NetrcParseError"]

Expand All @@ -22,6 +23,7 @@ def __str__(self):
class _netrclex:
def __init__(self, fp):
self.lineno = 1
self.dontskip = False
self.instream = fp
self.whitespace = "\n\t\r "
self.pushback = []
Expand All @@ -33,30 +35,29 @@ def _read_char(self):
return ch

def get_token(self):
self.dontskip = False
if self.pushback:
return self.pushback.pop(0)
token = ""
fiter = iter(self._read_char, "")
for ch in fiter:
if ch in self.whitespace:
enquoted = False
while ch := self._read_char():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this refactoring necessary to fix the bug? It's rather difficult to read the diff with so many lines reshuffled. If I were to guess, this might be the reason people are postponing doing reviews on this PR.

if ch == '\\':
ch = self._read_char()
token += ch
continue
if ch in self.whitespace and not enquoted:
if token == "":
continue
if ch == '\n':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about \r? \r\n?

self.dontskip = True
return token
if ch == '"':
for ch in fiter:
if ch == '"':
return token
elif ch == "\\":
ch = self._read_char()
token += ch
if enquoted:
return token
enquoted = True
continue
else:
if ch == "\\":
ch = self._read_char()
token += ch
for ch in fiter:
if ch in self.whitespace:
return token
elif ch == "\\":
ch = self._read_char()
token += ch
return token

def push_token(self, token):
Expand All @@ -66,7 +67,7 @@ def push_token(self, token):
class netrc:
def __init__(self, file=None):
default_netrc = file is None
if file is None:
if default_netrc:
file = os.path.join(os.path.expanduser("~"), ".netrc")
self.hosts = {}
self.macros = {}
Expand All @@ -81,13 +82,15 @@ def _parse(self, file, fp, default_netrc):
lexer = _netrclex(fp)
while 1:
# Look for a machine, default, or macdef top-level keyword
saved_lineno = lexer.lineno
toplevel = tt = lexer.get_token()
tt = lexer.get_token()
if not tt:
break
elif tt[0] == '#':
if lexer.lineno == saved_lineno and len(tt) == 1:
# For top level tokens, we skip line if the # is followed
# by a space / newline. Otherwise, we only skip the token.
if tt == '#' and not lexer.dontskip:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand why the entirety of t is being compared to a '#'. Is this semantically “the entire token consists of just #”?
This could be

Suggested change
if tt == '#' and not lexer.dontskip:
if len(tt) == 1 and not lexer.dontskip:

but then, why does it matter whether it's # thing vs #thing? Typically, comment parsers just disregard whatever's after the hash and don't interpret that in any way…

lexer.instream.readline()
lexer.lineno += 1
continue
elif tt == 'machine':
entryname = lexer.get_token()
Expand All @@ -98,6 +101,7 @@ def _parse(self, file, fp, default_netrc):
self.macros[entryname] = []
while 1:
line = lexer.instream.readline()
lexer.lineno += 1
if not line:
raise NetrcParseError(
"Macro definition missing null line terminator.",
Expand All @@ -114,17 +118,18 @@ def _parse(self, file, fp, default_netrc):
"bad toplevel token %r" % tt, file, lexer.lineno)

if not entryname:
raise NetrcParseError("missing %r name" % tt, file, lexer.lineno)
raise NetrcParseError(
"missing %r name" % tt, file, lexer.lineno)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an irrelevant formatting change

Suggested change
raise NetrcParseError(
"missing %r name" % tt, file, lexer.lineno)
raise NetrcParseError("missing %r name" % tt, file, lexer.lineno)


# We're looking at start of an entry for a named machine or default.
login = account = password = ''
self.hosts[entryname] = {}
while 1:
prev_lineno = lexer.lineno
tt = lexer.get_token()
if tt.startswith('#'):
if lexer.lineno == prev_lineno:
if not lexer.dontskip:
lexer.instream.readline()
lexer.lineno += 1
continue
if tt in {'', 'machine', 'default', 'macdef'}:
self.hosts[entryname] = (login, account, password)
Expand Down Expand Up @@ -165,12 +170,7 @@ def _security_check(self, fp, default_netrc, login):

def authenticators(self, host):
"""Return a (user, account, password) tuple for given host."""
if host in self.hosts:
return self.hosts[host]
elif 'default' in self.hosts:
return self.hosts['default']
else:
return None
return self.hosts.get(host, self.hosts.get('default'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that this might not be relevant to the fix and would be better in a separate refactoring PR.


def __repr__(self):
"""Dump the class data in the format of a .netrc file."""
Expand All @@ -188,5 +188,4 @@ def __repr__(self):
rep += "\n"
return rep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert deleting the CLI entry point logic:

Suggested change
return rep
return rep
if __name__ == '__main__':
print(netrc())


if __name__ == '__main__':
print(netrc())
15 changes: 15 additions & 0 deletions Lib/test/test_netrc.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,14 @@ def test_comment_before_machine_line_hash_only(self):
machine bar.domain.com login foo password pass
""")

def test_comment_after_new_line(self):
self._test_comment("""\
machine foo.domain.com login bar password pass

# TEST
machine bar.domain.com login foo password pass
""")

def test_comment_after_machine_line(self):
self._test_comment("""\
machine foo.domain.com login bar password pass
Expand Down Expand Up @@ -251,6 +259,13 @@ def test_comment_after_machine_line_hash_only(self):
#
""")

def test_comment_at_first_line(self):
self._test_comment("""
# TEST
machine foo.domain.com login bar password pass
machine bar.domain.com login foo password pass
""")

def test_comment_at_end_of_machine_line(self):
self._test_comment("""\
machine foo.domain.com login bar password pass # comment
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix incorrect comment parsing in the :mod:`netrc` module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on this and include context that would give an arbitrary changelog reader an idea of how the change might impact them?

Loading