我有一个pandas
数据帧,由列的地址字段组成。我的问题是,在两列中,行中有重复的单元格值。有人知道当在两列之间发现重复项时,我如何有条件地更改一列的值吗?理想情况下,我希望保留一个值,并将另一个值设置为np.nan
下面是一个测试用例:
import pandas as pd
test = pd.read_json('{"housename":{"16":null,"17":null,"18":null},"name":{"16":"Shoecare","17":"33","18":"33A"},"house_number":{"16":"32","17":"33","18":"33A"},"street":{"16":"Carfax","17":"Carfax","18":"Carfax"},"city":{"16":"Horsham","17":"Horsham","18":"Horsham"},"postcode":{"16":"RH12 1EE","17":"RH12 1EE","18":"RH12 1EE"}}')
city house_number housename name postcode street
16 Horsham 32 NaN Shoecare RH12 1EE Carfax
17 Horsham 33 NaN 33 RH12 1EE Carfax
18 Horsham 33A NaN 33A RH12 1EE Carfax
在测试用例中,我使用了test.duplicated(subset=['house_number', 'name'])
,但是它不会在house_number
和name
列中识别重复的值
有人对如何首先在两列中识别重复的单元格,然后将一个值设置为np.nan
有什么建议吗
期望输出:
housename name house_number street city postcode
16 NaN Shoecare 32 Carfax Horsham RH12 1EE
17 NaN NaN 33 Carfax Horsham RH12 1EE
18 NaN NaN 33A Carfax Horsham RH12 1EE
如果这两列是
house_number
和name
,则可以这样做:输出:
相关问题 更多 >
编程相关推荐