numpy:如何连接数组?(以获得多个范围的并集)

2024-04-28 12:06:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我将Python与numpy一起使用。

我有一个索引的numpy数组a

>>> a
array([[5, 7],
       [12, 18],
       [20, 29]])
>>> type(a)
<type 'numpy.ndarray'>

我有一个索引的numpy数组b

>>> b
array([[2, 4],
       [8, 11],
       [33, 35]])
>>> type(b)
<type 'numpy.ndarray'>

我需要将数组a与数组b连接起来:

a+b=>;[2, 4] [5, 7] [8, 11] [12, 18] [20, 29] [33, 35]

=>;ab有索引数组=>;[2, 18] [20, 29] [33, 35]

(索引([2, 4][5, 7][8, 11][12, 18])按顺序进行

=>;2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18=>;[2, 18]

对于本例:

>>> out_c
array([[2, 18],
       [20, 29],
       [33, 35]])

有人能建议一下,我怎么得到out_c

更新:@Geoff建议的解决方案python union of multiple ranges。此解决方案是否是大型数据阵列中最快和最好的解决方案?


Tags: ofgtnumpy顺序type数组解决方案multiple
3条回答

也许您可以尝试使用numpy.concatenate()将数组连接在一起,然后找到每行的最小值和最大值……然后创建c作为每行的最小值和最大值的矩阵。

或者,np.minimum和np.maximum比较两个数组并找到最小值和最大值,这样就可以找到每行的最小值和最大值,然后将其分配给矩阵c

(新答案)使用Numpy

ranges = np.vstack((a,b))
ranges.sort(0)

# List of non-overlapping ranges
nonoverlapping = (ranges[1:,0] - ranges[:-1,1] > 1).nonzero()[0]

# Starts are 0, and all the starts not overlapped by their predecessor
starts = np.hstack(([0], nonoverlapping + 1))

# Ends are -1 and all the ends who aren't overlapped by their successor
ends = np.hstack(( nonoverlapping, [-1]))

# Result
result = np.vstack((ranges[starts, 0], ranges[ends, 1])).T

(旧答案)使用列表和集合

import numpy as np
import itertools

def ranges(s):
    """ Converts a list of integers into start, end pairs """
    for a, b in itertools.groupby(enumerate(s), lambda(x, y): y - x):
        b = list(b)
        yield b[0][1], b[-1][1]

def intersect(*args):
    """ Converts any number of numpy arrays containing start, end pairs 
        into a set of indexes """
    s = set()
    for start, end in np.vstack(args):
        s = s | set(range(start,end+1))
    return s

a = np.array([[5,7],[12, 18],[20,29]])
b = np.array([[2,4],[8,11],[33,35]])

result = np.array(list(ranges(intersect(a,b))))

参考文献

不漂亮,但很管用。我不喜欢最后一个循环,buy想不出没有它的方法:

ab = np.vstack((a,b))
ab.sort(axis=0)

join_with_next = ab[1:, 0] - ab[:-1, 1] <= 1
endpoints = np.concatenate(([0],
                            np.where(np.diff(join_with_next) == True)[0]  + 2,
                            [len(ab,)]))
lengths = np.diff(endpoints)
new_lengths = lengths.copy()
if join_with_next[0] == True:
    new_lengths[::2] = 1
else:
    new_lengths[1::2] = 1
new_endpoints = np.concatenate(([0], np.cumsum(new_lengths)))
print endpoints, lengths
print new_endpoints, new_lengths

starts = endpoints[:-1]
ends = endpoints[1:]
new_starts = new_endpoints[:-1]
new_ends = new_endpoints[1:]
c = np.empty((new_endpoints[-1], 2), dtype=ab.dtype)

for j, (s,e,ns,ne) in enumerate(zip(starts, ends, new_starts, new_ends)):
    if e-s != ne-ns:
        c[ns:ne] = np.array([np.min(ab[s:e, 0]), np.max(ab[s:e, 1])])
    else:
        c[ns:ne] = ab[s:e]

>>> c
array([[ 2, 18],
       [20, 29],
       [33, 35]])

相关问题 更多 >