我有一个地址数据框,如下所示:
main_df =
address
0 3, my_street, Mumbai, Maharashtra
1 Bangalore Karnataka 45th Avenue
2 TelanganaHyderabad some_street, some apartment
我有一个城市和州的数据框架,如下所示(注意,少数州也有同名城市:
city_state_df =
city state
0 Mumbai Maharashtra
1 Ahmednagar Maharashtra
2 Ahmednagar Bihar
3 Bangalore Karnataka
4 Hyderabad Telangana
我想在每个地址旁边都有一个城市和州的映射。我可以用嵌套for循环的iterrows()
来实现。但是,这两种方法都需要一个多小时才能得到15k条记录。考虑到地址是随机写入的,并且多个州有相同的城市名称,实现这一点的最佳方法是什么
我的代码如下:
main_df = pd.DataFrame({'address': ['3, my_street, Mumbai, Maharashtra', 'Bangalore Karnataka 45th Avenue', 'TelanganaHyderabad some_street, some apartment']})
city_state_df = pd.DataFrame({'city': ['Mumbai', 'Ahmednagar', 'Ahmednagar', 'Bangalore', 'Hyderabad'],
'state': ['Maharashtra', 'Maharashtra', 'Bihar', 'Karnataka', 'Telangana']})
df['city'] = np.nan
df['state'] = np.nan
for i, df_row in df.iterrows():
for j, city_row in city_state_df.iterrows():
if city_row['city'] in df_row['address']:
city_filtered = city[city['city'] == city_row['city']]
for k, fil_row in city_filtered.iterrows():
if fil_row['state'] in df_row['address']:
df_row['city'] = fil_row['city']
df_row['state'] = fil_row['state']
break
break
你好,也许是这样的:
相关问题 更多 >
编程相关推荐