Pandas 数据帧的两列交集

mydataframe1 Out[15]: Start End 100 200 300 450 500 700 mydataframe2 Out[16]: Start End Value 0 400 0 401 499 -1 500 1000 1 1001 1698 1

2条回答

网友

1楼 · 编辑于 2024-10-01 11:27:39

我怀疑有没有熊猫的方法可以直接解决这个问题。你必须手动计算交点才能得到你想要的结果。intervaltree库至少使区间重叠计算更简单、更有效。在

IntervalTree.search()返回与提供的间隔重叠但不计算其交集的（完整）间隔。这就是为什么我还要应用我定义的intersect()函数。在

import pandas as pd
from intervaltree import Interval, IntervalTree

def intersect(a, b):
    """Intersection of two intervals."""
    intersection = max(a[0], b[0]), min(a[1], b[1])
    if intersection[0] > intersection[1]:
        return None
    return intersection

def interval_df_intersection(df1, df2):
    """Calculate the intersection of two sets of intervals stored in DataFrames.
    The intervals are defined by the "Start" and "End" columns.
    The data in the rest of the columns of df1 is included with the resulting
    intervals."""
    tree = IntervalTree.from_tuples(zip(
            df1.Start.values,
            df1.End.values,
            df1.drop(["Start", "End"], axis=1).values.tolist()
        ))

    intersections = []
    for row in df2.itertuples():
        i1 = Interval(row.Start, row.End)
        intersections += [list(intersect(i1, i2)) + i2.data for i2 in tree[i1]]

    # Make sure the column names are in the correct order
    data_cols = list(df1.columns)
    data_cols.remove("Start")
    data_cols.remove("End")
    return pd.DataFrame(intersections, columns=["Start", "End"] + data_cols)

interval_df_intersection(mydataframe2, mydataframe1)

结果和你所追求的完全一样。在

网友

2楼 · 编辑于 2024-10-01 11:27:39

下面是一个使用NCLS库的答案。它不进行拆分，而是回答标题中的问题，而且速度非常快。在

设置：

from ncls import NCLS

contents = """Start   End
100     200
300     450
500     700"""

import pandas as pd
from io import StringIO
df = pd.read_table(StringIO(contents), sep="\s+")

contents2 = """Start   End       Value
0       400       0
401     499       -1
500     1000      1
1001    1698      1"""
df2 = pd.read_table(StringIO(contents2), sep="\s+")

执行：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas 数据帧的两列交集

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >