Skip to content

gh-121999: Change default tarfile filter to 'data' #122002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 26, 2024
Merged
9 changes: 5 additions & 4 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -706,17 +706,18 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.

The keyword-only *filter* argument is passed to the underlying unpacking
function. For zip files, *filter* is not accepted.
For tar files, it is recommended to set it to ``'data'``,
For tar files, it is recommended to use the default, ``'data'``,
unless using features specific to tar and UNIX-like filesystems.
(See :ref:`tarfile-extraction-filter` for details.)
The ``'data'`` filter will become the default for tar files
in Python 3.14.

.. audit-event:: shutil.unpack_archive filename,extract_dir,format shutil.unpack_archive

.. warning::

Never extract archives from untrusted sources without prior inspection.
Never extract archives from untrusted sources without prior inspection,
even when using the ``'data'`` filter, but especially if using the
``'tar'`` or ``'fully_trusted'`` filters.

It is possible that files are created outside of the path specified in
the *extract_dir* argument, e.g. members that have absolute filenames
starting with "/" or filenames with two dots "..".
Expand Down
41 changes: 19 additions & 22 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@ Some facts and figures:
Archives are extracted using a :ref:`filter <tarfile-extraction-filter>`,
which makes it possible to either limit surprising/dangerous features,
or to acknowledge that they are expected and the archive is fully trusted.
By default, archives are fully trusted, but this default is deprecated
and slated to change in Python 3.14.

.. versionchanged:: 3.14
The default extraction filter was 'fully trusted' but is now 'data' which
which disallows dangerous features like links to absolute paths or paths
outside the destination.

.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

Expand Down Expand Up @@ -495,19 +497,23 @@ be finalized; only the internally used file object will be closed. See the
The *filter* argument specifies how ``members`` are modified or rejected
before extraction.
See :ref:`tarfile-extraction-filter` for details.
It is recommended to set this explicitly depending on which *tar* features
you need to support.
It is recommended to set this explicitly only if unusual *tar* features
are required.

.. warning::

Never extract archives from untrusted sources without prior inspection.
The default filter is set to ``filter='data'`` to prevent the most
dangerous security issues, read the :ref:`tarfile-extraction-filter`
section for details.

Never extract archives from untrusted sources without prior inspection,
even when using the ``'data'`` filter, but especially if using the
``'tar'`` or ``'fully_trusted'`` filters.

It is possible that files are created outside of *path*, e.g. members
that have absolute filenames starting with ``"/"`` or filenames with two
dots ``".."``.

Set ``filter='data'`` to prevent the most dangerous security issues,
and read the :ref:`tarfile-extraction-filter` section for details.

.. versionchanged:: 3.5
Added the *numeric_owner* parameter.

Expand Down Expand Up @@ -538,8 +544,9 @@ be finalized; only the internally used file object will be closed. See the

See the warning for :meth:`extractall`.

Set ``filter='data'`` to prevent the most dangerous security issues,
and read the :ref:`tarfile-extraction-filter` section for details.
The default filter is set to ``filter='data'`` to prevent the most
dangerous security issues, read the :ref:`tarfile-extraction-filter`
section for details.

.. versionchanged:: 3.2
Added the *set_attrs* parameter.
Expand Down Expand Up @@ -603,12 +610,7 @@ be finalized; only the internally used file object will be closed. See the
argument to :meth:`~TarFile.extract`.

If ``extraction_filter`` is ``None`` (the default),
calling an extraction method without a *filter* argument will raise a
``DeprecationWarning``,
and fall back to the :func:`fully_trusted <fully_trusted_filter>` filter,
whose dangerous behavior matches previous versions of Python.

In Python 3.14+, leaving ``extraction_filter=None`` will cause
calling an extraction method without a *filter* argument will cause
extraction methods to use the :func:`data <data_filter>` filter by default.

The attribute may be set on instances or overridden in subclasses.
Expand Down Expand Up @@ -992,12 +994,7 @@ can be:

* ``None`` (default): Use :attr:`TarFile.extraction_filter`.

If that is also ``None`` (the default), raise a ``DeprecationWarning``,
and fall back to the ``'fully_trusted'`` filter, whose dangerous behavior
matches previous versions of Python.

In Python 3.14, the ``'data'`` filter will become the default instead.
It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
If that is also ``None`` (the default), the ``'data'`` filter will be used.

* A callable which will be called for each extracted member with a
:ref:`TarInfo <tarinfo-objects>` describing the member and the destination
Expand Down
8 changes: 1 addition & 7 deletions Lib/tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2248,13 +2248,7 @@ def _get_filter_function(self, filter):
if filter is None:
filter = self.extraction_filter
if filter is None:
import warnings
warnings.warn(
'Python 3.14 will, by default, filter extracted tar '
+ 'archives and reject files or modify their metadata. '
+ 'Use the filter argument to control this behavior.',
DeprecationWarning, stacklevel=3)
return fully_trusted_filter
return data_filter
if isinstance(filter, str):
raise TypeError(
'String names are not supported for '
Expand Down
3 changes: 0 additions & 3 deletions Lib/test/test_shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -2145,9 +2145,6 @@ def check_unpack_archive_with_converter(self, format, converter, **kwargs):
def check_unpack_tarball(self, format):
self.check_unpack_archive(format, filter='fully_trusted')
self.check_unpack_archive(format, filter='data')
with warnings_helper.check_warnings(
('Python 3.14', DeprecationWarning)):
self.check_unpack_archive(format)

def test_unpack_archive_tar(self):
self.check_unpack_tarball('tar')
Expand Down
34 changes: 0 additions & 34 deletions Lib/test/test_tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,31 +738,6 @@ def test_extract_directory(self):
finally:
os_helper.rmtree(DIR)

def test_deprecation_if_no_filter_passed_to_extractall(self):
DIR = pathlib.Path(TEMPDIR) / "extractall"
with (
os_helper.temp_dir(DIR),
tarfile.open(tarname, encoding="iso8859-1") as tar
):
directories = [t for t in tar if t.isdir()]
with self.assertWarnsRegex(DeprecationWarning, "Use the filter argument") as cm:
tar.extractall(DIR, directories)
# check that the stacklevel of the deprecation warning is correct:
self.assertEqual(cm.filename, __file__)

def test_deprecation_if_no_filter_passed_to_extract(self):
dirtype = "ustar/dirtype"
DIR = pathlib.Path(TEMPDIR) / "extractall"
with (
os_helper.temp_dir(DIR),
tarfile.open(tarname, encoding="iso8859-1") as tar
):
tarinfo = tar.getmember(dirtype)
with self.assertWarnsRegex(DeprecationWarning, "Use the filter argument") as cm:
tar.extract(tarinfo, path=DIR)
# check that the stacklevel of the deprecation warning is correct:
self.assertEqual(cm.filename, __file__)

def test_extractall_pathlike_dir(self):
DIR = os.path.join(TEMPDIR, "extractall")
with os_helper.temp_dir(DIR), \
Expand Down Expand Up @@ -4011,15 +3986,6 @@ def test_data_filter(self):
self.assertIs(filtered.name, tarinfo.name)
self.assertIs(filtered.type, tarinfo.type)

def test_default_filter_warns(self):
"""Ensure the default filter warns"""
with ArchiveMaker() as arc:
arc.add('foo')
with warnings_helper.check_warnings(
('Python 3.14', DeprecationWarning)):
with self.check_context(arc.open(), None):
self.expect_file('foo')

def test_change_default_filter_on_instance(self):
tar = tarfile.TarFile(tarname, 'r')
def strict_filter(tarinfo, path):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update tarfile library to use 'data' filter by default when extracting
Loading