当跨两列有重复单元格时,如何更改一列中单元格的值

2024-10-02 00:21:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据帧,由列的地址字段组成。我的问题是,在两列中,行中有重复的单元格值。有人知道当在两列之间发现重复项时,我如何有条件地更改一列的值吗?理想情况下,我希望保留一个值,并将另一个值设置为np.nan

下面是一个测试用例:

import pandas as pd

test = pd.read_json('{"housename":{"16":null,"17":null,"18":null},"name":{"16":"Shoecare","17":"33","18":"33A"},"house_number":{"16":"32","17":"33","18":"33A"},"street":{"16":"Carfax","17":"Carfax","18":"Carfax"},"city":{"16":"Horsham","17":"Horsham","18":"Horsham"},"postcode":{"16":"RH12 1EE","17":"RH12 1EE","18":"RH12 1EE"}}')

    city        house_number    housename   name        postcode    street
16  Horsham     32              NaN         Shoecare    RH12 1EE    Carfax
17  Horsham     33              NaN         33          RH12 1EE    Carfax
18  Horsham     33A             NaN         33A         RH12 1EE    Carfax

在测试用例中,我使用了test.duplicated(subset=['house_number', 'name']),但是它不会在house_numbername列中识别重复的值

有人对如何首先在两列中识别重复的单元格,然后将一个值设置为np.nan有什么建议吗

期望输出:

    housename   name      house_number  street  city     postcode
16  NaN         Shoecare  32            Carfax  Horsham  RH12 1EE
17  NaN         NaN       33            Carfax  Horsham  RH12 1EE
18  NaN         NaN       33A           Carfax  Horsham  RH12 1EE

Tags: namestreetnumbercitypandasnpnannull
1条回答
网友
1楼 · 发布于 2024-10-02 00:21:10

如果这两列是house_numbername,则可以这样做:

test['name'] = np.where((test['house_number'] == test['name']), np.nan, test['name'])

输出:

       city house_number  housename      name  postcode  street
16  Horsham           32        NaN  Shoecare  RH12 1EE  Carfax
17  Horsham           33        NaN       NaN  RH12 1EE  Carfax
18  Horsham          33A        NaN       NaN  RH12 1EE  Carfax

相关问题 更多 >

    热门问题