Pandas的DataFrame-重命名多个同名列

3条回答

网友

1楼 · 编辑于 2024-05-18 14:50:10

你可以用这个：

def df_column_uniquify(df):
    df_columns = df.columns
    new_columns = []
    for item in df_columns:
        counter = 0
        newitem = item
        while newitem in new_columns:
            counter += 1
            newitem = "{}_{}".format(item, counter)
        new_columns.append(newitem)
    df.columns = new_columns
    return df

那么

import numpy as np
import pandas as pd

df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']

因此，df：

   blah  blah2  blah3   blah   blah
0     0      1      2      3      4
1     5      6      7      8      9

那么

df = df_column_uniquify(df)

因此，df：

   blah  blah2  blah3  blah_1  blah_2
0     0      1      2       3       4
1     5      6      7       8       9

网友

2楼 · 编辑于 2024-05-18 14:50:10

Starting with Pandas 0.19.0 ^{} has improved support for duplicate column names

所以我们可以尝试使用内部方法：

In [137]: pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
Out[137]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']

这就是“魔力”功能：

def _maybe_dedup_names(self, names):
    # see gh-7160 and gh-9424: this helps to provide
    # immediate alleviation of the duplicate names
    # issue and appears to be satisfactory to users,
    # but ultimately, not needing to butcher the names
    # would be nice!
    if self.mangle_dupe_cols:
        names = list(names)  # so we can index
        counts = {}

        for i, col in enumerate(names):
            cur_count = counts.get(col, 0)

            if cur_count > 0:
                names[i] = '%s.%d' % (col, cur_count)

            counts[col] = cur_count + 1

    return names

网友

3楼 · 编辑于 2024-05-18 14:50:10

我希望在Pandas中找到一个解决方案，而不是一般的Python解决方案。如果列的get_loc（）函数找到指向找到重复项的位置的值为“True”的重复项，则返回一个屏蔽数组。然后我使用掩码将新值分配到这些位置。在我的例子中，我提前知道要得到多少个dup，以及要分配给它们什么，但是看起来df.columns.get_duplicates（）将返回所有dup的列表，如果需要更通用的dup消除操作，则可以将该列表与get_loc（）结合使用

cols=pd.Series(df.columns)
for dup in df.columns.get_duplicates(): 
    cols[df.columns.get_loc(dup)] = ([dup + '.' + str(d_idx) 
                                     if d_idx != 0 
                                     else dup 
                                     for d_idx in range(df.columns.get_loc(dup).sum())]
                                    )
df.columns=cols

    blah    blah2   blah3   blah.1  blah.2
 0     0        1       2        3       4
 1     5        6       7        8       9

新的更好的方法（更新日期：2019年12月3日）

下面的代码比上面的代码好。从下面的另一个答案（@SatishSK）复制：

#sample df with duplicate blah column
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df

# you just need the following 4 lines to rename duplicates
# df is the dataframe that you want to rename duplicated columns

cols=pd.Series(df.columns)

for dup in cols[cols.duplicated()].unique(): 
    cols[cols[cols == dup].index.values.tolist()] = [dup + '.' + str(i) if i != 0 else dup for i in range(sum(cols == dup))]

# rename the columns with the cols list.
df.columns=cols

df

输出：

    blah    blah2   blah3   blah.1  blah.2
0   0   1   2   3   4
1   5   6   7   8   9

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas的DataFrame-重命名多个同名列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >