Skip to content

Commit 8983782

Browse files
authored
More docs to from_dict to mention that the result lives in RAM (#7316)
docs from dict
1 parent 661d7ba commit 8983782

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

src/datasets/arrow_dataset.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -804,6 +804,12 @@ def from_pandas(
804804
contains `None/nan` objects, the type is set to `null`. This behavior can be avoided by constructing explicit
805805
features and passing it to this function.
806806
807+
Important: a dataset created with from_pandas() lives in memory
808+
and therefore doesn't have an associated cache directory.
809+
This may change in the feature, but in the meantime if you
810+
want to reduce memory usage you should write it back on disk
811+
and reload using using e.g. save_to_disk / load_from_disk.
812+
807813
Args:
808814
df (`pandas.DataFrame`):
809815
Dataframe that contains the dataset.
@@ -898,6 +904,12 @@ def from_dict(
898904
"""
899905
Convert `dict` to a `pyarrow.Table` to create a [`Dataset`].
900906
907+
Important: a dataset created with from_dict() lives in memory
908+
and therefore doesn't have an associated cache directory.
909+
This may change in the feature, but in the meantime if you
910+
want to reduce memory usage you should write it back on disk
911+
and reload using using e.g. save_to_disk / load_from_disk.
912+
901913
Args:
902914
mapping (`Mapping`):
903915
Mapping of strings to Arrays or Python lists.
@@ -957,6 +969,12 @@ def from_list(
957969
Note that the keys of the first entry will be used to determine the dataset columns,
958970
regardless of what is passed to features.
959971
972+
Important: a dataset created with from_list() lives in memory
973+
and therefore doesn't have an associated cache directory.
974+
This may change in the feature, but in the meantime if you
975+
want to reduce memory usage you should write it back on disk
976+
and reload using using e.g. save_to_disk / load_from_disk.
977+
960978
Args:
961979
mapping (`List[dict]`): A list of mappings of strings to row values.
962980
features (`Features`, optional): Dataset features.

0 commit comments

Comments
 (0)