Python将samed名称列合并到数据帧中

网友

1楼 · 编辑于 2024-09-27 02:15:26

当然，DSM和CTZhu给出了非常简洁的答案，它们利用了Python的许多内置特性，尤其是dataframe。这里有点…[咳嗽]-冗长

def myJoiner(row):
    newrow = []
    for r in row:
        if not pandas.isnull(r):
            newrow.append(str(r))
    return ';'.join(newrow)

def groupCols(df, key):
    columns = df.select(lambda col: key in col, axis=1)
    joined = columns.apply(myJoiner, axis=1)
    joined.name = key
    return pandas.DataFrame(joined)

import pandas 
from io import StringIO  # python 3.X
#from StringIO import StringIO #python 2.X

data = StringIO("""\
ID   Name   a    a    a     b    b
1    test1  1    NaN  NaN   "a"  NaN
2    test2  NaN  2    NaN   "a"  NaN
3    test3  2    3    NaN   NaN  "b"
4    test4  NaN  NaN  4     NaN  "b"
""")

df = pandas.read_table(data, sep='\s+')
df.set_index(['ID', 'Name'], inplace=True)


AB = groupCols(df, 'a').join(groupCols(df, 'b'))
print(AB)

这给了我：

                a  b
ID Name             
1  test1      1.0  a
2  test2      2.0  a
3  test3  2.0;3.0  b
4  test4      4.0  b

网友

2楼 · 编辑于 2024-09-27 02:15:26

您可以在axis=1上使用groupby，并使用类似

>>> def sjoin(x): return ';'.join(x[x.notnull()].astype(str))
>>> df.groupby(level=0, axis=1).apply(lambda x: x.apply(sjoin, axis=1))
  ID   Name        a  b
0  1  test1      1.0  a
1  2  test2      2.0  a
2  3  test3  2.0;3.0  b
3  4  test4      4.0  b

在这里，您可以使用所需的任何格式运算符，而不是使用.astype(str)

网友

3楼 · 编辑于 2024-09-27 02:15:26

使用重复的列名可能不是一个好主意，但它会起作用：

In [72]:

df2=df[['ID', 'Name']]
df2['a']='"'+df.T[df.columns.values=='a'].apply(lambda x: ';'.join(["%i"%item for item in x[x.notnull()]]))+'"' #these columns are of float dtype
df2['b']=df.T[df.columns.values=='b'].apply(lambda x: ';'.join([item for item in x[x.notnull()]])) #these columns are of objects dtype
print df2
   ID   Name      a    b
0   1  test1    "1"  "a"
1   2  test2    "2"  "a"
2   3  test3  "2;3"  "b"
3   4  test4    "4"  "b"

[4 rows x 4 columns]

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python将samed名称列合并到数据帧中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >