合并绩效有显著差异吗？

2024-05-20 10:09:57 发布

您现在位置：Python中文网/ 问答频道 /正文

596

网友

男 | 程序猿一只，喜欢编程写python代码。

我想用一个条件进行左连接（如果left.values>；=右（&L）；left.values<；好的，嗨）

所以我写了以下代码：

在左侧数据库中删除键的副本
将上述条件与正确的数据库进行合并
然后将结果数据库与初始左侧数据库合并

所以我的助手函数是：

import pandas as pd
import numpy as np

data = pd.read_pickle("C:/Quang/base_datalake_net.pkl")

t_BEH_VitMaxi = pd.read_csv("table/VEH_VitMaxi.csv", delimiter=';', decimal=',')
t_VEH_Age = pd.read_csv("table/VEH_Age.csv", delimiter=';', decimal=',')


def left_cond_merge_simple_help(left, right, left_on, right_on_lo, right_on_hi):
    left.reset_index(drop=True, inplace=True)
    right.reset_index(drop=True, inplace=True)
    a = left[left_on].values
    bh = right[right_on_hi].values
    bl = right[right_on_lo].values
    i, j = np.where((a[:, None] >= bl) & (a[:, None] < bh))
    result = pd.concat([left.loc[i].reset_index(drop=True),
                        right.loc[j].reset_index(drop=True)],
                       axis=1).append(
        left[~np.in1d(np.arange(len(left)), np.unique(i))], ignore_index=True)
    return result


def left_cond_merge_simple(left, right, left_on, right_on_lo, right_on_hi):
    temp = pd.DataFrame({left_on: left[left_on].unique()})
    temp = left_cond_merge_simple_help(left=temp, right=right, left_on=left_on,
                                       right_on_lo=right_on_lo, right_on_hi=right_on_hi)
    return left.merge(temp, on=left_on, how='left')

奇怪的是：

如果我只运行这行代码，它需要4秒，这是非常长的，因为第3步，而我的左db的尺寸只有36000x300，我的右db的尺寸是20x5，右边的键是唯一的

% time data = left_cond_merge_simple(left=data, right=t_VEH_Age, left_on='VEH_Age', 
                                     right_on_lo='lo', right_on_hi='hi')

但如果在这行代码之后运行这行代码（也需要4秒），只需要0.1秒：

% time data = left_cond_merge_simple(left=data, right=t_BEH_VitMaxi, left_on='VEH_VitMaxi',
                                     right_on_lo='lo', right_on_hi='hi')

原因是什么

我右边的db看起来像：

Tags： right true lo data index on np merge

0条回答

目前没有回答

合并绩效有显著差异吗？

相关问题更多 >

编程相关推荐

热门问题

热门文章

合并绩效有显著差异吗？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >