如何在不重复列名的情况下将系列连接或合并到数据帧

2024-09-28 05:18:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图将从函数返回的序列连接到数据帧,但我不希望列重复。我怎样才能做到这一点?完整的数据集大约有100k行,大约有100个子集(定义在一个带掩码的循环中),因此希望有一个计算快速的解决方案。使用Python 3.7

示例

import pandas as pd

def myfcn(row, data, val):
    z1 = row['y'] + val
    z2 = row['x']*row['y']
    return pd.Series(
        {'fancy_column_name1': z1, 
         'fancy_column_name2': z2/val},
        name=row.name
    )
    

col1 = [1, 1.5, 3.1, 3.4, 2, -1]
col2 = [1, -3, 2, 8, 2.5, -1.3]
df = pd.DataFrame(list(zip(col1, col2)), columns=['x', 'y'])
display(df)

### In the real case, this is all in a loop with many subsets that 
### are created with masks & specific criteria; this is 
### simplified here
df_subset = df.iloc[[0,2,3]]
#display(df_subset)
out = df_subset.apply(myfcn, axis=1, args=(df_subset, 100))
df = pd.concat([df, out], axis=1)

df_subset2 = df.iloc[[5]]
out = df_subset2.apply(myfcn, axis=1, args=(df_subset2, 250))
df = pd.concat([df, out], axis=1)
display(df)

这是父数据帧“df”
enter image description here
这是当前输出
enter image description here这是想要的输出
enter image description here
如何删除重复的列名,将数据折叠到同一列中?我想保留号码,而不是南号。永远不会有一个实例,其中一行中要保留多个数字,但可能有一个实例中没有数字(因此,请保留NaN)


Tags: 数据dfdisplaycolumnvaloutrowpd
2条回答

pandas.DataFrame.combine_first: Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

只需将df = pd.concat([df, out], axis=1)替换为-

df = df.combine_first(out)

enter image description here

更多详情here


订单未保留的原因是out只有2列。这些首先取代了NAN的值。因此,他们成为第一批。您可以在out之前插入空白的x和y来解决此问题-

out.insert(0, 'x', 0)
out.insert(1, 'y', 0)
df = df.combine_first(out)

将此添加到循环中,让我知道您的列顺序现在是否已修复

一起计算子设置,然后将这些列附加在一起,然后合并到主数据框中。我稍微修改了您的代码:

def myfcn(row, data, val):
    z1 = row['y'] + val
    z2 = row['x']*row['y']
    return pd.Series(
        {'fancy_column_name1': z1, 
         'fancy_column_name2': z2/val},
        name=row.name
    )
    

col1 = [1, 1.5, 3.1, 3.4, 2, -1]
col2 = [1, -3, 2, 8, 2.5, -1.3]
df = pd.DataFrame(list(zip(col1, col2)), columns=['x', 'y'])

df_subset = df.iloc[[0,2,3]]
#display(df_subset)
out1 = df_subset.apply(myfcn, axis=1, args=(df_subset, 100))
df_subset2 = df.iloc[[5]]
out2 = df_subset2.apply(myfcn, axis=1, args=(df_subset2, 250))
out = out1.append(out2)

df = pd.merge(df, out, left_index=True, right_index=True, how="left")
print(df)

输出:

     x    y  fancy_column_name1  fancy_column_name2
0  1.0  1.0               101.0              0.0100
1  1.5 -3.0                 NaN                 NaN
2  3.1  2.0               102.0              0.0620
3  3.4  8.0               108.0              0.2720
4  2.0  2.5                 NaN                 NaN
5 -1.0 -1.3               248.7              0.0052

相关问题 更多 >

    热门问题