Please add force_suffixes to pandas.merge() #17834

sorenwacker · 2017-10-10T09:03:54Z

Code Sample, a copy-pastable example if possible

import pandas as pd

A = pd.DataFrame({'colname': [1, 2]})

B = pd.DataFrame({'colname': [1, 2]})

C = pd.DataFrame({'colname': [1, 2]})

D = pd.merge(A, B, right_index=True, left_index=True, suffixes=('_A', '_B'))

print(pd.merge(D, B, right_index=True, left_index=True, suffixes=('', '_C'))

>   colname_A  colname_B  colname
> 0          1          1        1
> 1          2          2        2

Problem description

When using pandas.merge() suffixes are only added to the column-names if a column-name appears in both data frames. With more than two data frames that can lead to the above situation. The first merge adds suffixes. When the resulting table is merged with a third table the column-name does not appear twice and the suffix is not added. The suffix then needs to be added manually. This is overhead, especially when larger numbers of data-frames are merged. An option "force_suffixes" would be appreciated that ensures the suffix is added.

Expected Output

>   colname_A  colname_B  colname_C
> 0          1          1        1
> 1          2          2        2

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-10-14T18:10:05Z

@soerendip : Thanks for reporting this! Could you explain why you would want the suffix to be forcefully added, even if duplicates don't exist? I'm having some difficulty understanding the use-case.

sorenwacker · 2017-10-15T19:14:31Z

@gfyoung: Say you have several tables with the identical column names. You want to combine them into one table while keeping track of the source. As in my example above, the third table did not get the label. You could use the f.add_suffix('_some_suffix'), I suppose, and merge the table afterwards.

ash13 · 2017-10-18T17:25:12Z

Hi,
I want to work on this, can anyone help me? I am a beginner.

jreback · 2017-10-18T20:26:43Z

@ash13 I would try a different issue; this request has not gotten much commentary; i am -0 on it nowhere

ThomasProctor · 2017-10-31T18:36:34Z

This isn't coming from nowhere - there's a stackoverflow question on it. As of October 31, 2017, it has 10 up-votes.

gfyoung · 2017-10-31T18:41:23Z

@ThomasProctor : Thanks for pointing that out. I'm still not convinced that we should do this. You can easily call df.rename to rename any confusing columns.

ThomasProctor · 2017-10-31T18:45:20Z

Here's the kludge I've been using to do this. Maybe it will help with a fix. I make no claims to it being remotely well-engineered

def merge_force_suffix(left, right, **kwargs):
    on_col = kwargs['on']
    suffix_tupple = kwargs['suffixes']
    
    def suffix_col(col, suffix):
        if col != on_col:
            return str(col) + suffix
        else:
            return col
            
    left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
    right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
    del kwargs['suffixes']
    return pd.merge(left_suffixed, right_suffixed, **kwargs)

gfyoung · 2017-10-31T18:48:56Z

Maybe it will help with a fix.

@ThomasProctor : Thanks for sharing this. I think this would be a reasonable workaround for people to use if they so need to.

ThomasProctor · 2017-10-31T18:49:11Z

@gfyoung My use case right now is merging together 10 or so DataFrames, all with possible redundant column names. Trying to keep track of where columns came from is just annoying.

joshdempster · 2018-04-11T14:33:03Z

@gfyoung : +1 for this feature. Like others, I find having to rename dataframes to get suffixes annoying and unnatural. Either you have to modify in place (never the behavior I want), or you have to duplicate the frames. I'd actually argue that the default behavior of join should be to always append suffixes if they are supplied, but at least having the option would make merging multiple frames less unpleasant.

asavinov · 2019-02-02T12:26:06Z

This feature (enforcing suffixes) is definitely needed for all non-trivial merges in order to avoid ambiguity: currently I never know what column names I will get in the result frame. Ideally, it should be a default option: if I provide a suffix then it must be used.

TheRealPJC · 2019-03-14T15:49:56Z

I agree - this very useful.

elpablete · 2019-05-28T02:29:09Z

Having a df 'links' with source and destination columns. I need to add details about the source and the destination from df 'cities'. Easy: juast merge links and cities on the source and city_id and then merge the result again with cities, only this time on destination and city_id. What do I get? A little mess because the details of the source city are NOT marked with the '_s' suffix after the first merge, but the second merge ends with the '_d' suffix on all the columns from df cities.

links = merge(links, cities, left_on='source', right_on='city_id', suffixes=('', '_s'))
links = merge(links, cities, left_on='destination', right_on='city_id', suffixes=('', '_d'))

I don't really want to rename cities prior to the first merge and then again prior to the second. And I definitely do not want cities to have its column names altered after I do the double merge.

TomAugspurger · 2019-05-28T20:39:41Z

The original request seems reasonable. I'm not sure about the name `force_suffixes`, but I haven't come up with anything better.

…

On Mon, May 27, 2019 at 9:29 PM Juan Pablo Tamayo ***@***.***> wrote: Having a df 'links' with source and destination columns. I need to add details about the source and the destination from df 'cities'. Easy: juast merge links and cities on the source and city_id and then merge the result again with cities, only this time on destination and city_id. What do I get? A little mess because the details of the source city are NOT marked with the '_s' suffix after the first merge, but the second merge ends with the '_d' suffix on all the columns from df cities. links = merge(links, cities, left_on='source', right_on='city_id', suffixes=('', '_s')) links = merge(links, cities, left_on='destination', right_on='city_id', suffixes=('', '_d')) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#17834?email_source=notifications&email_token=AAKAOIWN3KO7GN5FU2K5C5DPXSKH3A5CNFSM4D6OMT6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWKZLNI#issuecomment-496342453>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIS3UKTFJNBIJNKSG73PXSKH3ANCNFSM4D6OMT6A> .

lia-simeone · 2019-10-30T19:41:18Z

add_suffix is a palatable alternative to building this feature into merge I think. I agree it doesn't entirely address a use case with merging 10+ dataframes as 10+ calls to add_suffix isn't great. But it's workable and certainly better than alternatives I used in the past.

Binary-yev · 2020-04-28T03:02:10Z

yeah this feature would be great . say I have one total dataframe, I break it out by column value and then merge to another. every time i do that in a for loop would be great to add that column value I selected and add it to new columnd name. say product X mean, product Y mean, product Z mean

AntoineLamer · 2020-07-20T07:07:11Z

Hi !

Here is my use case :
I'm working on record linkage. I have a reference dataset with fixed fields.
(1) The user load its own dataset.
(2) I perform record linkage operation.
(3) At the end, I need to export the results and select the relevant fields.

As I don't know which fields the user will load, and will be duplicate with my reference dataset, I'd like to add suffixes for the two dataframes while merging. This would fix the problem of correctly selecting the field as I don't know in advance which one will be duplicate.

As reported above, I could suffixe all fields earlier, when loading the dataframes, but seems less convenient.

juandpinto · 2020-10-12T01:50:35Z

Any update on this feature? It's been a while since it was first suggested, but as others have pointed out, there are plenty of good use cases for it. Could have certainly used it on my current project with multiple merges!

jreback · 2020-10-12T01:53:28Z

pandas is all volunteer

you are welcome to submit a PR

joao-d-oliveira · 2021-11-09T12:24:03Z

Here's the kludge I've been using to do this. Maybe it will help with a fix. I make no claims to it being remotely well-engineered

def merge_force_suffix(left, right, **kwargs):
    on_col = kwargs['on']
    suffix_tupple = kwargs['suffixes']
    
    def suffix_col(col, suffix):
        if col != on_col:
            return str(col) + suffix
        else:
            return col
            
    left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
    right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
    del kwargs['suffixes']
    return pd.merge(left_suffixed, right_suffixed, **kwargs)

Just a small remark... it's a nice implementation, thanks. However it only works for single column "on".
If you have multiple keys, it won't work.

You could make a small adaption:
if col not in on_col:
instead of:
if col != on_col:

therealbct · 2022-09-10T19:37:27Z

Here's the kludge I've been using to do this. Maybe it will help with a fix. I make no claims to it being remotely well-engineered
def merge_force_suffix(left, right, **kwargs):
    on_col = kwargs['on']
    suffix_tupple = kwargs['suffixes']
    
    def suffix_col(col, suffix):
        if col != on_col:
            return str(col) + suffix
        else:
            return col
            
    left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
    right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
    del kwargs['suffixes']
    return pd.merge(left_suffixed, right_suffixed, **kwargs)
Just a small remark... it's a nice implementation, thanks. However it only works for single column "on". If you have multiple keys, it won't work.

You could make a small adaption: if col not in on_col: instead of: if col != on_col:

If you only replace != on_col with not in on_col, you'll encounter issues when merging on a single string name column which is inclusive of the name of other columns (e.g. using argument on='ab', while another column is named 'b'). If this happens, you'll end up with the default behavior of merge (e.g. do nothing, or add an _x or _y when columns are duplicates instead of enforcing the dedicated suffixes on those columns for which the name is included in the "on" column), therefore defeating the attempt of this workaround to circumvent the default behavior.

Updated version that works with both single and multi-column "on":

def merge_force_suffix(left, right, **kwargs):
    on_col = kwargs['on']
    on_col = [on_col] if not isinstance(on_col, (list, tuple)) else on_col
    suffix_tupple = kwargs['suffixes']
    
    def suffix_col(col, suffix):
        if col not in on_col:
            return str(col) + suffix
        else:
            return col
            
    left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
    right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
    del kwargs['suffixes']
    return pd.merge(left_suffixed, right_suffixed, **kwargs)

terryaa · 2024-01-29T00:45:06Z

I don't believe this is still open. need this feature so bad right now.. :(

ExarKun64 · 2024-10-07T14:00:36Z

Hi,
I've been struggled with this issue for hours, trying to figure out why my suffixes was not applied.
I think it would be nice to at least make this present behaviour of the suffixes argument appears more clearly in pandas API documentation.
Maybe by adding an example, or something like that ?
And I agree with the solution of a force_suffixes parameter, or something similar. Would be really nice to have, and would release and ambiguity or missunderstanding of the suffixes parameter.

gfyoung added Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Groupby labels Oct 14, 2017

jreback mentioned this issue Nov 13, 2017

Add parameter always_suffix to DataFrame.merge #18260

Closed

jreback added API Design Needs Discussion Requires discussion from core team before further action labels Nov 13, 2017

jschendel mentioned this issue Nov 28, 2017

3 way joins column suffix looks wrong #18541

Closed

mroeschke added Enhancement and removed API Design Needs Discussion Requires discussion from core team before further action labels Jun 12, 2021

kopytjuk mentioned this issue May 26, 2025

ENH: Add force_suffixes boolean argument to pd.merge #61498

Open

5 tasks

Uh oh!

Please add force_suffixes to pandas.merge() #17834

Please add force_suffixes to pandas.merge() #17834

Comments

sorenwacker commented Oct 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

gfyoung commented Oct 14, 2017

Uh oh!

sorenwacker commented Oct 15, 2017

Uh oh!

ash13 commented Oct 18, 2017

Uh oh!

jreback commented Oct 18, 2017

Uh oh!

ThomasProctor commented Oct 31, 2017

Uh oh!

gfyoung commented Oct 31, 2017

Uh oh!

ThomasProctor commented Oct 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Oct 31, 2017

Uh oh!

ThomasProctor commented Oct 31, 2017

Uh oh!

joshdempster commented Apr 11, 2018

Uh oh!

asavinov commented Feb 2, 2019

Uh oh!

TheRealPJC commented Mar 14, 2019

Uh oh!

elpablete commented May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented May 28, 2019 via email

Uh oh!

lia-simeone commented Oct 30, 2019

Uh oh!

Binary-yev commented Apr 28, 2020

Uh oh!

AntoineLamer commented Jul 20, 2020

Uh oh!

juandpinto commented Oct 12, 2020

Uh oh!

jreback commented Oct 12, 2020

Uh oh!

joao-d-oliveira commented Nov 9, 2021

Uh oh!

therealbct commented Sep 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terryaa commented Jan 29, 2024

Uh oh!

ExarKun64 commented Oct 7, 2024

Uh oh!

sorenwacker commented Oct 10, 2017 •

edited

Loading

ThomasProctor commented Oct 31, 2017 •

edited

Loading

elpablete commented May 28, 2019 •

edited

Loading

therealbct commented Sep 10, 2022 •

edited

Loading