Skip to content

Commit 832bfa1

Browse files
authored
GH-46652: [Python][Docs] Update language for row_group_size parameter (#46653)
### Rationale for this change The docstrings for row_group_size could be clearer both in terms of (1) whether the value is rows instead of byte size and (2) use of unit prefixes. See #46652. My idea here was that just saying "64 * 1024 * 1024" is probably more easily understood than using Mi (mebi). The existing text may be just fine so I'm happy to close this if others like how it reads now. ### What changes are included in this PR? - Updated language in docstrings for row_group_size - Add missing `, default None` to docstring for top-level `write_table` ### Are these changes tested? No. ### Are there any user-facing changes? No. * GitHub Issue: #46652 Authored-by: Bryce Mecum <[email protected]> Signed-off-by: AlenkaF <[email protected]>
1 parent 76bd326 commit 832bfa1

File tree

1 file changed

+12
-11
lines changed

1 file changed

+12
-11
lines changed

python/pyarrow/parquet/core.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1102,9 +1102,9 @@ def write(self, table_or_batch, row_group_size=None):
11021102
----------
11031103
table_or_batch : {RecordBatch, Table}
11041104
row_group_size : int, default None
1105-
Maximum number of rows in each written row group. If None,
1106-
the row group size will be the minimum of the input
1107-
table or batch length and 1024 * 1024.
1105+
Maximum number of rows in each written row group. If None, the row
1106+
group size will be the minimum of the number of rows in the
1107+
Table/RecordBatch and 1024 * 1024.
11081108
"""
11091109
if isinstance(table_or_batch, pa.RecordBatch):
11101110
self.write_batch(table_or_batch, row_group_size)
@@ -1123,8 +1123,8 @@ def write_batch(self, batch, row_group_size=None):
11231123
row_group_size : int, default None
11241124
Maximum number of rows in written row group. If None, the
11251125
row group size will be the minimum of the RecordBatch
1126-
size and 1024 * 1024. If set larger than 64Mi then 64Mi
1127-
will be used instead.
1126+
size (in rows) and 1024 * 1024. If set larger than 64 * 1024 * 1024
1127+
then 64 * 1024 * 1024 will be used instead.
11281128
"""
11291129
table = pa.Table.from_batches([batch], batch.schema)
11301130
self.write_table(table, row_group_size)
@@ -1138,9 +1138,9 @@ def write_table(self, table, row_group_size=None):
11381138
table : Table
11391139
row_group_size : int, default None
11401140
Maximum number of rows in each written row group. If None,
1141-
the row group size will be the minimum of the Table size
1142-
and 1024 * 1024. If set larger than 64Mi then 64Mi will
1143-
be used instead.
1141+
the row group size will be the minimum of the Table size (in rows)
1142+
and 1024 * 1024. If set larger than 64 * 1024 * 1024 then
1143+
64 * 1024 * 1024 will be used instead.
11441144
11451145
"""
11461146
if self.schema_changed:
@@ -2017,10 +2017,11 @@ def write_table(table, where, row_group_size=None, version='2.6',
20172017
----------
20182018
table : pyarrow.Table
20192019
where : string or pyarrow.NativeFile
2020-
row_group_size : int
2020+
row_group_size : int, default None
20212021
Maximum number of rows in each written row group. If None, the
2022-
row group size will be the minimum of the Table size and
2023-
1024 * 1024.
2022+
row group size will be the minimum of the Table size (in rows)
2023+
and 1024 * 1024. If set larger than 64 * 1024 * 1024 then
2024+
64 * 1024 * 1024 will be used instead.
20242025
{_parquet_writer_arg_docs}
20252026
**kwargs : optional
20262027
Additional options for ParquetWriter

0 commit comments

Comments
 (0)