Skip to content

Commit 8e6a6b1

Browse files
niklasvmadchia
authored andcommitted
fix: Fix Spark offline store type conversion to arrow (#3071)
* Fix unit tests related to empty list types Signed-off-by: niklasvm <[email protected]> * formatting Signed-off-by: niklasvm <[email protected]> Signed-off-by: niklasvm <[email protected]>
1 parent 1ac2186 commit 8e6a6b1

File tree

1 file changed

+7
-2
lines changed
  • sdk/python/feast/infra/offline_stores/contrib/spark_offline_store

1 file changed

+7
-2
lines changed

sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import tempfile
12
import warnings
23
from datetime import datetime
34
from typing import Dict, List, Optional, Tuple, Union
@@ -6,6 +7,7 @@
67
import pandas
78
import pandas as pd
89
import pyarrow
10+
import pyarrow.parquet as pq
911
import pyspark
1012
from pydantic import StrictStr
1113
from pyspark import SparkConf
@@ -264,8 +266,11 @@ def _to_df_internal(self) -> pd.DataFrame:
264266

265267
def _to_arrow_internal(self) -> pyarrow.Table:
266268
"""Return dataset as pyarrow Table synchronously"""
267-
df = self.to_df()
268-
return pyarrow.Table.from_pandas(df) # noqa
269+
270+
# write to temp parquet and then load it as pyarrow table from disk
271+
with tempfile.TemporaryDirectory() as temp_dir:
272+
self.to_spark_df().write.parquet(temp_dir, mode="overwrite")
273+
return pq.read_table(temp_dir)
269274

270275
def persist(self, storage: SavedDatasetStorage):
271276
"""

0 commit comments

Comments
 (0)