Skip to content

bpo-33234 Improve list() pre-sizing for inputs with known lengths (no __length_hint__) #9846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Oct 28, 2018
Merged
9 changes: 9 additions & 0 deletions Lib/test/test_list.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import sys
from test import list_tests
from test.support import cpython_only
import pickle
import unittest

Expand Down Expand Up @@ -157,5 +158,13 @@ class L(list): pass
with self.assertRaises(TypeError):
(3,) + L([1,2])

@cpython_only
def test_preallocation(self):
iterable = [0] * 10
iter_size = sys.getsizeof(iterable)

self.assertEqual(iter_size, sys.getsizeof(list([0] * 10)))
self.assertEqual(iter_size, sys.getsizeof(list(range(10))))

if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The list constructor will pre-size and not over-allocate when
the input lenght is known.
40 changes: 40 additions & 0 deletions Objects/listobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,33 @@ list_resize(PyListObject *self, Py_ssize_t newsize)
return 0;
}

static int
list_preallocate_exact(PyListObject *self, Py_ssize_t size)
{
assert(self->ob_item == NULL);

PyObject **items;
size_t allocated;

allocated = (size_t)size;
if (allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
PyErr_NoMemory();
return -1;
}

if (size == 0) {
allocated = 0;
}
items = (PyObject **)PyMem_New(PyObject*, allocated);
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
self->allocated = allocated;
return 0;
}

/* Debug statistic to compare allocations with reuse through the free list */
#undef SHOW_ALLOC_COUNT
#ifdef SHOW_ALLOC_COUNT
Expand Down Expand Up @@ -2649,6 +2676,19 @@ list___init___impl(PyListObject *self, PyObject *iterable)
(void)_list_clear(self);
}
if (iterable != NULL) {
if (_PyObject_HasLen(iterable)) {
Py_ssize_t iter_len = PyObject_Size(iterable);
if (iter_len == -1) {
if (!PyErr_ExceptionMatches(PyExc_TypeError)) {
return -1;
}
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pablogsal, @serhiy-storchaka: In a previous comment, I proposed to add an helper function to "probe" an object size: so move this code into a private helper function. Since the same code is used by PyObject_LengthHint(), it would now make sense, no?

See also iter_len() ... which is different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will be used just in two places and is not too complex. Adding yet one intermediate function will add a performance penalty for calling a function and checking its result, and will complicate the code. If this code will be used in more places, it can be refactored.

}
if (iter_len > 0 && self->ob_item == NULL
&& list_preallocate_exact(self, iter_len)) {
return -1;
}
}
PyObject *rv = list_extend(self, iterable);
if (rv == NULL)
return -1;
Expand Down