根据an article,vectorization
比apply
一个函数到dafaframe列的速度要快得多。你知道吗
但我有一个特殊的例子:
import pandas as pd
df = pd.DataFrame({'IP': [ '1.0.64.2', '100.23.154.63', '54.62.1.3']})
def compare3rd(ip):
"""Check if the 3dr part of an IP is greater than 100 or not"""
ip_3rd = ip.split('.')[2]
if int(ip_3rd) > 100:
return True
else:
return False
# This works but very slow
df['check_results'] = df.IP.apply(lambda x: compare3rd(x))
print df
# This is supposed to be much faster
# But it doesn't work ...
df['check_results_2'] = compare3rd(df['IP'].values)
print df
完全错误回溯如下所示:
Traceback (most recent call last):
File "test.py", line 16, in <module>
df['check_results_2'] = compare3rd(df['IP'].values)
File "test.py", line 6, in compare3rd
ip_3rd = ip.split('.')[2]
AttributeError: 'numpy.ndarray' object has no attribute 'split'
我的问题是:在这种情况下,如何正确地使用这个vectorization
方法?你知道吗
用
pandas
中的str
检查既然你提到
vectorize
相关问题 更多 >
编程相关推荐