基于单列值连接重复行

2024-10-01 00:19:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图删除数据框中的重复值和空格,然后对所有值重新排序,以便在一行中,所有列值以相同的数字结尾:

这是我当前的数据帧:

    brand   code   des   price   year
0  brand1  code1  des1  price1  year1
1  brand2  code2        price2       
2  brand3  code3  des3  price3  year3
3  brand4  code4        price4       
4  brand5  code5  des5  price5  year5
5  brand6  code6        price6       
6          code2  des2          year2
7          code4  des4          year4
8          code6  des6          year6

这就是我想要的输出:

    brand   code   des   price   year
0  brand1  code1  des1  price1  year1
1  brand2  code2  des2  price2  year2
2  brand3  code3  des3  price3  year3
3  brand4  code4  des4  price4  year4
4  brand5  code5  des5  price5  year5
5  brand6  code6  des6  price6  year6

这是我写的代码,如果有人能指导我怎么做,我将非常感激:

import pandas as pd

data = {
'code': ['code1','code2','code3','code4','code5','code6','code2','code4','code6'],
'des': ['des1','','des3','','des5','','des2','des4','des6'],
'price': ['price1','price2','price3','price4','price5','price6','','',''],
'year': ['year1','','year3','','year5','','year2','year4','year6'],
'brand': ['brand1','brand2','brand3','brand4','brand5','brand6','','','']
}


df = pd.DataFrame.from_dict(data)
print(df)

Tags: codeyearpricedesbrandcode2code1code4
2条回答

您可以对每个列使用df.apply(),然后对每个列系列使用^{}获取已排序的唯一项列表(跳过空字符串),然后使用pd.Series重新创建列

import numpy as np

df.apply(lambda x: pd.Series(np.unique(x[x!=''])))

输出:

    code   des   price   year   brand
0  code1  des1  price1  year1  brand1
1  code2  des2  price2  year2  brand2
2  code3  des3  price3  year3  brand3
3  code4  des4  price4  year4  brand4
4  code5  des5  price5  year5  brand5
5  code6  des6  price6  year6  brand6

这就是你要找的吗?首先用np.nan填充空白,然后使用apply删除na行

df = df.replace(r'^\s*$', np.nan, regex=True)
df.apply(lambda x: pd.Series(x.dropna().values))

code    des     price   year    brand
0   code1   des1    price1  year1   brand1
1   code2   des3    price2  year3   brand2
2   code3   des5    price3  year5   brand3
3   code4   des2    price4  year2   brand4
4   code5   des4    price5  year4   brand5
5   code6   des6    price6  year6   brand6
6   code2   NaN     NaN     NaN     NaN
7   code4   NaN     NaN     NaN     NaN
8   code6   NaN     NaN     NaN     NaN

相关问题 更多 >