Python3函数与Pandas数据帧输入运行慢,然后第一次和更快的第二次

2024-10-03 21:36:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我构建了两个函数major\u check\u with \u dataframe和major\u check\u with \u list,我想看看哪一个运行得更快。我对他们的跑步时间感到困惑

import numpy as np
DTYPE_FLOAT = np.float
import anapyfunc.major
import pandas as pd
from testpython.timer import timer

# wrapper function
def major_check(**kwargs):
    #return major_check_with_list(**kwargs)
    return major_check_with_dataframe(**kwargs)

# functions to be timed
def major_check_with_dataframe(df, s_major_single_hi_percent, s_major_single_lo_percent):
    ...

def major_check_with_list(source_list, s_major_single_hi_percent, s_major_single_lo_percent):
    ...

# main function starts here
t = timer.Timer(verbose = True, run = False)
t.set_name(name = 'major check timer')

a = np.random.choice(101, 2500)
b = np.random.choice(101, 2500)
c = np.random.choice(101, 2500)

s_major_single_hi_percent = 70 
s_major_single_lo_percent = 10

dd = {'a' : a, 'b' : b , 'c' : c}
df = pd.DataFrame(dd)

# axis 0 = tick
# axis 1 = input arrays

t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check_with_list')
t.start()
ret2 = anapyfunc.major.major_check_with_list(
                                source_list = [a, b, c,],
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                )
t.stop_reset()


t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
                                  #source_list = [a, b, c,],
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

当major\u check调用带有数据帧的major\u check时输出

major_check_with_dataframe elapsed time: 94.261000 ms
major_check_with_list elapsed time: 2.316000 ms
major_check elapsed time: 3.055000 ms

当major\u check调用major\u check\u和\u list时输出

major_check_with_dataframe elapsed time: 95.042000 ms
major_check_with_list elapsed time: 2.240000 ms
major_check elapsed time: 2.240000 ms

我发现,如果第二次使用数据帧运行major\u check,它的运行时间将减少到与运行major\u check包装函数几乎相同的时间

t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check_with_list')
t.start()
ret2 = anapyfunc.major.major_check_with_list(
                                source_list = [a, b, c,],
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                )
t.stop_reset()


t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
                                  #source_list = [a, b, c,],
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
                                  #source_list = [a, b, c,],
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

输出

major_check_with_dataframe elapsed time: 95.608000 ms
major_check_with_list elapsed time: 2.350000 ms
major_check elapsed time: 3.048000 ms
major_check_with_dataframe elapsed time: 2.569000 ms
major_check elapsed time: 2.520000 ms

会不会是某种内存缓存效应? 即使我将函数放在一个类中,使用class对象运行一次函数,并在每次运行后删除/垃圾收集它,这种行为也是一样的

所有函数都正确返回预期值。 我错过了什么? 我使用的版本是:

Python 3.4.3(默认,2016年11月17日,01:08:31)

熊猫0.21.0


Tags: namelodataframedftimecheckwithhi