Pandas:如何确定一个数据框中的地址是否来自另一个数据框中的城市和州?

2024-09-28 01:24:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个地址数据框,如下所示:

main_df =
                                          address
0               3, my_street, Mumbai, Maharashtra
1                 Bangalore Karnataka 45th Avenue
2  TelanganaHyderabad some_street, some apartment

我有一个城市和州的数据框架,如下所示(注意,少数州也有同名城市:

city_state_df =
         city        state
0      Mumbai  Maharashtra
1  Ahmednagar  Maharashtra
2  Ahmednagar        Bihar
3   Bangalore    Karnataka
4   Hyderabad    Telangana

我想在每个地址旁边都有一个城市和州的映射。我可以用嵌套for循环的iterrows()来实现。但是,这两种方法都需要一个多小时才能得到15k条记录。考虑到地址是随机写入的,并且多个州有相同的城市名称,实现这一点的最佳方法是什么

我的代码如下:

main_df = pd.DataFrame({'address': ['3, my_street, Mumbai, Maharashtra', 'Bangalore Karnataka 45th Avenue', 'TelanganaHyderabad some_street, some apartment']})
city_state_df = pd.DataFrame({'city': ['Mumbai', 'Ahmednagar', 'Ahmednagar', 'Bangalore', 'Hyderabad'],
                     'state': ['Maharashtra', 'Maharashtra', 'Bihar', 'Karnataka', 'Telangana']})

df['city'] = np.nan
df['state'] = np.nan

for i, df_row in df.iterrows():
    for j, city_row in city_state_df.iterrows():
        if city_row['city'] in df_row['address']:
            city_filtered = city[city['city'] == city_row['city']]
            for k, fil_row in city_filtered.iterrows():
                if fil_row['state'] in df_row['address']:
                    df_row['city'] = fil_row['city']
                    df_row['state'] = fil_row['state']
                    break
            break


Tags: instreetcitydfforaddresssomerow
1条回答
网友
1楼 · 发布于 2024-09-28 01:24:01

你好,也许是这样的:

main_df = main_df.reindex(columns=[*main_df.columns.tolist(), 'state', 'city'],fill_value=None)

for i, row in city_state_df.iterrows():
    main_df.loc[(main_df.address.str.contains(row.city)) & \
                 (main_df.address.str.contains(row.state)), \
                ['city', 'state']] = [row.city, row.state]

相关问题 更多 >

    热门问题