如何合并Pandas中具有不同形状的数据帧？问题的回答

如何合并Pandas中具有不同形状的数据帧？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试将熊猫中的两个数据帧与大数据集合并，但是这给我带来了一些问题。我将尝试用一个较小的例子来说明 df1有一个设备列表和几个与设备相关的列： <pre><code>Item ID Equipment Owner Status Location 1 Jackhammer James Active London 2 Cement Mixer Tim Active New York 3 Drill Sarah Active Paris 4 Ladder Luke Inactive Hong Kong 5 Winch Kojo Inactive Sydney 6 Circular Saw Alex Active Moscow </code></pre> df2有一个使用设备的实例列表。这有一些类似于df1的列，但是一些字段是NaN值，并且还记录了df1中未记录的设备实例： <pre><code>Item ID Equipment Owner Date Location 1 Jackhammer James 08/09/2020 London 1 Jackhammer James 08/10/2020 London 2 Cement Mixer NaN 29/02/2020 New York 3 Drill Sarah 11/02/2020 NaN 3 Drill Sarah 30/11/2020 NaN 3 Drill Sarah 21/12/2020 NaN 6 Circular Saw Alex 19/06/2020 Moscow 7 Hammer Ken 21/12/2020 Toronto 8 Sander Ezra 19/06/2020 Frankfurt </code></pre> 我希望最终得到的数据帧是： <pre><code>Item ID Equipment Owner Status Date Location 1 Jackhammer James Active 08/09/2020 London 1 Jackhammer James Active 08/10/2020 London 2 Cement Mixer Tim Active 29/02/2020 New York 3 Drill Sarah Active 11/02/2020 Paris 3 Drill Sarah Active 30/11/2020 Paris 3 Drill Sarah Active 21/12/2020 Paris 4 Ladder Luke Inactive NaN Hong Kong 5 Winch Kojo Inactive NaN Sydney 6 Circular Saw Alex Active 19/06/2020 Moscow 7 Hammer Ken NaN 21/12/2020 Toronto 8 Sander Ezra NaN 19/06/2020 Frankfurt </code></pre> 相反，通过以下代码，我得到了重复的行，我认为这是因为NaN值： <pre><code>data = pd.merge(df1, df2, how='outer', on=['Item ID']) Item ID Equipment_x Equipment_y Owner_x Owner_y Status Date Location_x Location_y 1 Jackhammer NaN James James Active 08/09/2020 London London 1 Jackhammer NaN James James Active 08/10/2020 London London 2 Cement Mixer NaN Tim NaN Active 29/02/2020 New York New York 3 Drill NaN Sarah Sarah Active 11/02/2020 Paris NaN 3 Drill NaN Sarah Sarah Active 30/11/2020 Paris NaN 3 Drill NaN Sarah Sarah Active 21/12/2020 Paris NaN 4 Ladder NaN Luke NaN Inactive NaN Hong Kong Hong Kong 5 Winch NaN Kojo NaN Inactive NaN Sydney Sydney 6 Circular Saw NaN Alex NaN Active 19/06/2020 Moscow Moscow 7 NaN Hammer NaN Ken NaN 21/12/2020 NaN Toronto 8 NaN Sander NaN Ezra NaN 19/06/2020 NaN Frankfurt </code></pre> 理想情况下，我可以直接删除y列，但是底部行中的数据意味着我将丢失重要信息。相反，我唯一能想到的是合并列，强制熊猫比较每列中的值，并始终支持非NaN值。我不确定这是否可行

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

一般来说，您可以按如下方式进行： <pre><code># merge the two dataframes using a suffix that ideally does # not appear in your data suffix_string='_DF2' data = pd.merge(df1, df2, how='outer', on=['Item_ID'], suffixes=('', suffix_string)) # now remove the duplicate columns by mergeing the content # use the value of column + suffix_string if column is empty columns_to_remove= list() for col in df1.columns: second_col= f'{col}{suffix_string}' if second_col in data.columns: data[col]= data[second_col].where(data[col].isna(), data[col]) columns_to_remove.append(second_col) if columns_to_remove: data.drop(columns=columns_to_remove, inplace=True) data </code></pre> 结果是： <pre><code> Item_ID Equipment Owner Status Location Date 0 1 Jackhammer James Active London 08/09/2020 1 1 Jackhammer James Active London 08/10/2020 2 2 Cement_Mixer Tim Active New_York 29/02/2020 3 3 Drill Sarah Active Paris 11/02/2020 4 3 Drill Sarah Active Paris 30/11/2020 5 3 Drill Sarah Active Paris 21/12/2020 6 4 Ladder Luke Inactive Hong_Kong NaN 7 5 Winch Kojo Inactive Sydney NaN 8 6 Circular_Saw Alex Active Moscow 19/06/2020 9 7 Hammer Ken NaN Toronto 21/12/2020 10 8 Sander Ezra NaN Frankfurt 19/06/2020 </code></pre> 根据以下试验数据： <pre><code>df1= pd.read_csv(io.StringIO("""Item_ID Equipment Owner Status Location 1 Jackhammer James Active London 2 Cement_Mixer Tim Active New_York 3 Drill Sarah Active Paris 4 Ladder Luke Inactive Hong_Kong 5 Winch Kojo Inactive Sydney 6 Circular_Saw Alex Active Moscow"""), sep='\s+') df2= pd.read_csv(io.StringIO("""Item_ID Equipment Owner Date Location 1 Jackhammer James 08/09/2020 London 1 Jackhammer James 08/10/2020 London 2 Cement_Mixer NaN 29/02/2020 New_York 3 Drill Sarah 11/02/2020 NaN 3 Drill Sarah 30/11/2020 NaN 3 Drill Sarah 21/12/2020 NaN 6 Circular_Saw Alex 19/06/2020 Moscow 7 Hammer Ken 21/12/2020 Toronto 8 Sander Ezra 19/06/2020 Frankfurt"""), sep='\s+') </code></pre>

如何合并Pandas中具有不同形状的数据帧？

1 个回答

相关Python问题