加速dataframe.loc（）

import geoip2.database import pandas as pd reader = geoip2.database.Reader('path/to/GeoLite2-City.mmdb') results = pd.DataFrame(columns=('IP', 'city', 'latitude', 'longitude', 'dept_code')) for i, IP in enumerate(df_IP["IP"]): try : response = reader.city(IP) results.loc[i] = [IP,response.city.name,response.location.latitude,response.location.longitude,response.subdivisions.most_specific.iso_code] except Exception as e: print ("error with line {}, IP {}: {}").format(i,df_IP["IP"][i],e )

2条回答

网友

1楼 · 编辑于 2024-06-01 08:18:02

我也遇到了同样的问题，正如@oliversm建议的那样，我创建了一个列表，然后将其添加到原始数据集中。代码如下所示：

。。。。在

results_list=[]

for i, IP in enumerate(df_IP["IP"]):
    try :
        response = reader.city(IP)
     results_list.append( response.city.name,response.location.latitude,response.location.longitude,response.subdivisions.most_specific.iso_code)
    except Exception as e:
        print ("error with line {}, IP {}: {}").format(i,df_IP["IP"][i],e )

results_array=np.asarray(results_list) #list to array to add to the dataframe as a new column

results['results_column']=pd.Series(results_array,index=results.index)

网友

2楼 · 编辑于 2024-06-01 08:18:02

我也面临着类似的情况，因为loc导致运行时崩溃。经过一番努力，我找到了一个简单的解决方案，它非常快。使用set_value代替loc。在

这就是示例代码的外观：您可以根据您的用例对其进行调整。假设你的数据帧是这样的

Index  'A'  'B' 'Label'
23      0    1    Y
45      3    2    N

self.data.set_value(45,'Label,'NA')

这将把第二行的列“Label”的值设置为NA。在

有关set U值的更多信息，请访问以下链接：

http://pandas.pydata.org/pandas-docs/version/0.17/generated/pandas.DataFrame.set_value.html

相关问题更多 >

编程相关推荐

热门问题

热门文章