Description
Describe the bug
I get the following error when trying to obtain the modified files of a commit:
File "2_commit_processing.py", line 50, in process_commit
"files": list(map(get_file, commit.modifications))
File "/home/oscarch/.local/lib/python3.8/site-packages/pydriller/domain/commit.py", line 567, in modifications
self._modifications = self._get_modifications()
File "/home/oscarch/.local/lib/python3.8/site-packages/pydriller/domain/commit.py", line 582, in _get_modifications
diff_index = self._c_object.parents[0].diff(self._c_object,
File "/home/oscarch/.local/lib/python3.8/site-packages/git/diff.py", line 145, in diff
index = diff_method(self.repo, proc)
File "/home/oscarch/.local/lib/python3.8/site-packages/git/diff.py", line 451, in _index_from_patch_format
index.append(Diff(repo,
File "/home/oscarch/.local/lib/python3.8/site-packages/git/diff.py", line 278, in init
if submodule.path == a_rawpath.decode("utf-8"):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 54: invalid continuation byte
It happens with this commit:
adobe/brackets@8b3ae04
I suspect it has to do with the name of the file in the commit: HighASCII_été.css
This issue may be related to this one: #58
To Reproduce
My code is doing this:
- Use RepositoryMining to get commits
- Iterate over the commits
- When the commit above is processed (commit.modifications), the exception is thrown
OS Version:
Ubuntu 20.04 LTS