Open
Description
It is a rather specific corner case, but there has been a change in behaviour when merging an empty frame:
In [1]: pd.__version__
Out[1]: '0.19.2'
In [2]: left = pd.DataFrame(columns=['key', 'col_left'])
In [3]: left
Out[3]:
Empty DataFrame
Columns: [key, col_left]
Index: []
In [4]: right = pd.DataFrame({'col_right': ['a', 'b', 'c']})
In [5]: right
Out[5]:
col_right
0 a
1 b
2 c
In [6]: left.merge(right, left_on='key', right_index=True, how="right")
Out[6]:
key col_left col_right
0 0 NaN a
1 1 NaN b
2 2 NaN c
vs
In [10]: pd.__version__
Out[10]: u'0.18.1'
In [11]: left = pd.DataFrame(columns=['key', 'col_left'])
In [12]: left
Out[12]:
Empty DataFrame
Columns: [key, col_left]
Index: []
In [13]: right = pd.DataFrame({'col_right': ['a', 'b', 'c']})
In [14]: right
Out[14]:
col_right
0 a
1 b
2 c
In [15]: left.merge(right, left_on='key', right_index=True, how="right")
Out[15]:
key col_left col_right
0 NaN NaN a
1 NaN NaN b
2 NaN NaN c
So with 0.19 the 'key'
column has values, in 0.18 this holds NaNs. The key column comes from the empty frame (so it had no values, how can it have values now?), but is merged with the index of the left frame (and this has of course values -> should these end up in the 'key' column of the resulting frame?)
It is such a strange case, that I am actually not sure which of both is the expected behaviour .. (and also not sure if this was an intentional change in behaviour).
Encountered here: geopandas/geopandas#422