-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Labels
Description
Describe the bug, including details regarding any error messages, version, and platform.
First of, this might not be a problem—let me know if that's the case! Hard for me to wrap my head around the consequences of Arrow's NA model with interchanging 😅
So, if one were to interchange a pandas.DataFrame
containing NaNs to a pyarrow.Table
, one gets nulls in place of NaNs.
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"foo": pd.Series([float("nan")], dtype=np.float64)})
>>> from pyarrow.interchange import from_dataframe
>>> from_dataframe(df)
pyarrow.Table
foo: double
----
foo: [[null] # expect NaN?
We get the same with modin
, which also adopts the interchange protocol.
>>> import modin
>>> import ray
>>> ray.init(local_mode=True)
>>> from modin.config import Engine
>>> Engine.put("ray")
>>> from modin import pandas as mpd
>>> df = mpd.DataFrame({"foo": mpd.Series([float("nan")], dtype=np.float64)})
>>> from_dataframe(df)
pyarrow.Table
foo: double
----
foo: [[null]]
I see interchanging another pa.Table
with NaNs works fine, asummedly because from_dataframe()
short-circuits when it gets a pa.Table
/pa.RecordBatch
.
>>> table = pa.Table.from_pydict({"foo": pa.array([float("nan")], type=pa.float64())})
>>> from_dataframe(table)
pyarrow.Table
foo: double
----
foo: [[nan]]
Using the arrow nightly from https://pypi.fury.io/arrow-nightlies/
cc @AlenkaF
Component(s)
Python