Python/pandas：创建数据帧的列，并基于在另一个数据帧的范围内查找列值来设置其值

countryList = [] for index, row in inputDF.iterrows(): integerIP = row['integerIP'] countryISO = ip2CountryDF.loc[(integerIP >= ip2CountryDF['startIP']) & (integerIP <= ip2CountryDF['endIP']),'countryISO'].iloc[0] countryList.append(countryISO) inputDF['countryISO'] = countryList

1条回答

网友

1楼 · 发布于 2024-05-19 20:27:30

你太近了。您只是缺少对“map”函数的调用

加载IpToCountry.csv（用于文档编制）：

IP2COUNTRY = "https://github.com/urbanadventurer/WhatWeb/raw/master/plugins/IpToCountry.csv"
db = pd.read_csv(IP2COUNTRY, header=None, usecols=[0, 1, 4],
                 names=["startIP", "endIP", "countryISO"], comment="#")

>>> db
           startIP       endIP countryISO
0                0    16777215         ZZ
1         16777216    16777471         AU
2         16777472    16777727         CN
3         16777728    16778239         CN
4         16778240    16779263         AU
...            ...         ...        ...
211757  4211081216  4227858431         ZZ
211758  4227858432  4244635647         ZZ
211759  4244635648  4261412863         ZZ
211760  4261412864  4278190079         ZZ
211761  4278190080  4294967295         ZZ

[211762 rows x 3 columns]

创建一个函数ip2country，对于十进制ip，该函数返回相应的iso国家代码：

def ip2country(ip: int):
    return db.loc[(db["startIP"] <= ip) & (ip <= db["endIP"]), "countryISO"].squeeze()


df["countryISO"] = df["integerIP"].map(ip2country)

>>> df
         sourceIP   eventTime   integerIP countryISO
0  114.119.157.43  2021-03-01  1920441643         SG
1   193.205.128.7  2021-03-01  3251470343         IT
2   193.205.128.7  2021-03-01  3251470343         IT
3   193.205.128.7  2021-03-01  3251470343         IT

性能

对于10k ip地址，结果在2,5 GHz四核Intel Core i7上平均在11.7秒内返回

df1 = pd.DataFrame({"integerIP": np.random.randint(db["startIP"].min(), 
                                                   db["endIP"].max()+1,
                                                   size=10000)})

%timeit df1["integerIP"].map(ip2country)
11.7 s ± 489 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python/pandas：创建数据帧的列，并基于在另一个数据帧的范围内查找列值来设置其值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >