Python对海量数据的哈希性能

2024-06-16 11:52:43 发布

您现在位置：Python中文网/ 问答频道 /正文

850

网友

男 | 程序猿一只，喜欢编程写python代码。

技术堆栈：

Python 3.8

当为执行此功能（以可接受的格式重新构造数据）时

只有少数100秒的时间戳有效工作（0.022秒）
批量为100000+需要大量时间（约40秒）

其中分组值的长度为250+

def re_struct_data(all_timestamps: List, grouped_values: Dict[String, Dict[Integer, Integer]]):
    tm_count = len(all_timestamps)

    start_tm = 1607494871
    get_tms = lambda: [None] * tm_count
    data_matrix = {'runTime': get_tms()}

    for i_idx, tm in enumerate(all_timestamps):

        data_matrix['runTime'][i_idx] = float(tm) - start_tm
        for cnl_nm in grouped_values:
            if cnl_nm not in data_matrix:
                data_matrix[cnl_nm] = get_tms()

            value_dict = grouped_values[cnl_nm]
            if tm in value_dict:
                data_matrix[cnl_nm][i_idx] = value_dict[tm]
    return data_matrix

当我做了同样的代码分析时，我知道了大量的时间都花在了对data_matrix中cnl_nm的存在/不存在进行哈希处理上

我试着换成

setdefault（）->；（与发动机罩下的操作相同）
使用.items()->；（元组转换+解包）

但这需要更多的时间

有什么改进的建议吗

Tags： in data get value 时间 all matrix tm

0条回答

目前没有回答

Python对海量数据的哈希性能

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python对海量数据的哈希性能

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >