Skip to content

[Python] DataFrame interchange protocol: NaNs are interchanged as null #35535

@honno

Description

@honno

Describe the bug, including details regarding any error messages, version, and platform.

First of, this might not be a problem—let me know if that's the case! Hard for me to wrap my head around the consequences of Arrow's NA model with interchanging 😅

So, if one were to interchange a pandas.DataFrame containing NaNs to a pyarrow.Table, one gets nulls in place of NaNs.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"foo": pd.Series([float("nan")], dtype=np.float64)})
>>> from pyarrow.interchange import from_dataframe
>>> from_dataframe(df)
pyarrow.Table
foo: double
----
foo: [[null]    # expect NaN?

We get the same with modin, which also adopts the interchange protocol.

>>> import modin
>>> import ray
>>> ray.init(local_mode=True)
>>> from modin.config import Engine
>>> Engine.put("ray")
>>> from modin import pandas as mpd
>>> df = mpd.DataFrame({"foo": mpd.Series([float("nan")], dtype=np.float64)})
>>> from_dataframe(df)
pyarrow.Table
foo: double
----
foo: [[null]]

I see interchanging another pa.Table with NaNs works fine, asummedly because from_dataframe() short-circuits when it gets a pa.Table/pa.RecordBatch.

>>> table = pa.Table.from_pydict({"foo": pa.array([float("nan")], type=pa.float64())})
>>> from_dataframe(table)
pyarrow.Table
foo: double
----
foo: [[nan]]

Using the arrow nightly from https://pypi.fury.io/arrow-nightlies/

cc @AlenkaF

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions