以更快/更好的方式从具有最接近值的列表中查找dict

for itemA in ListA: closestItemB = min(ListB, key=lambda x :abs(x["timestamp"])-int(itemA["timestamp")) if(abs(itemA['timestamp'] - closestItemB['timestamp']) < 15: #write info from both dicts to a csv file

for itemA in listA: closestItemB = min(listB[lastFoundIndex:lastFoundIndex+3, key=lambda x :abs(x["timestamp"])-int(itemA["timestamp")) if(abs(itemA['timestamp'] - closestItemB['timestamp']) < 15: lastFoundIndex = listB.index(closestItemB) #write info from both dicts to a csv file else: closestItemB = min(listB[lastFoundIndex:len(listB)-1, key=lambda x :abs(x["timestamp"])-int(itemA["timestamp")) if(abs(itemA['timestamp'] - closestItemB['timestamp']) < 15: lastFoundIndex = listB.index(closestItemB) #write info from both dicts to a csv file

1条回答

网友

1楼 · 发布于 2024-09-25 08:36:14

下面的代码呢？它使用了两个带有“timestamp”数字的列表，而不是dicts，但是使用dicts只会稍微有点麻烦复杂的问题-算法将保持不变。你知道吗

这里的想法是将两个指针指向a和b（ia和ib），然后查看ia和ib处的值是否足够接近，以便匹配。如果不是，那么如果差值为正，就意味着a中的值比b中的值要远得多，ib必须追赶。如果差距是负的，那么情况就相反了，ia必须要跟上。你知道吗

a = [1, 4, 35, 40, 56, 70, 90, 110 ]
b = [3, 20, 39, 57, 62, 84, 100, 150]

ia = 0
ib = 0
while ia < len(a) and ib < len(b):
    delta = a[ia] - b[ib]
    if abs(delta) <= 15:
        print("Found match at ia={} ({}) and ib={} ({})".format(ia, a[ia], ib, b[ib]))
        # Both items are matched, continue with the next ones 
        ia += 1
        ib += 1
    elif delta > 15:
        # we're too far behind in the b list, try to catch up
        ib += 1
    elif delta < -15:
        # too far behind in the a list, try to catch up 
        ia += 1

请注意，我不确定如何处理一个列表中的两个值可能与第二个列表中的一个值匹配的情况—例如，a列表中的1和4都可以与b列表中的3匹配，但是所给出的算法会在值与另一个列表中的伙伴匹配后将其从竞争中剔除。你可以通过改变ia和ib的情况来改变这一点。你知道吗

下面的代码查找所有可能的匹配项（我认为），仍然只进行一次迭代（但不会将匹配项添加到候选列表以查找最佳匹配项）：

a = [1, 4, 35, 40, 56, 70, 90, 110 ]
b = [3, 20, 39, 57, 62, 84, 100, 150]

ia = 0
ib = 0
while ia < len(a) and ib < len(b):
    delta = a[ia] - b[ib]
    if abs(delta) <= 15:
        print("Found match at ia={} ({}) and ib={} ({})".format(ia, a[ia], ib, b[ib]))
        if delta < 0:
           # there might be a better match yet for the timestamp at ib
           ia += 1
        elif delta > 0:
           # there might be a better match yet for the timestamp in ia
           ib += 1
        else:
           # perfect match, it won't get any better. Move along in both lists
           ia += 1
           ib += 1
    elif delta > 15:
        # we're too far behind in the b list, try to catch up
        ib += 1
    elif delta < -15:
        # too far behind in the a list, try to catch up 
        ia += 1

现在，如果您真的需要找到最佳（最接近的）匹配，您的代码可能如下所示：

a = [1, 4, 35, 40, 56, 70, 90, 110 ]
b = [3, 20, 39, 57, 62, 84, 100, 150]

ia = 0
ib = 0
best_at = -1
best_diff = 10000
while ia < len(a) and ib < len(b):
    delta = a[ia] - b[ib]
    if abs(delta) <= 15:        
        print("Found match at ia={} ({}) and ib={} ({})".format(ia,  a[ia], ib, b[ib]))
        if abs(delta) < best_diff:
            best_at = ib
            best_diff = abs(delta)                   
        if delta < 0:
            if best_diff < 10000:
                print("Best match for {} is {} at ib={}".format(a[ia], b[best_at], best_at))
                best_diff = 10000
            ia += 1            
        elif delta > 0:
            ib += 1
        else:
            # perfect match
            print("Best match for {} is {} at ib={}".format(a[ia], b[best_at], best_at))
            best_diff = 10000
            ia += 1
            ib += 1

    elif delta > 15:
        ib += 1
    elif delta < -15:
        if best_diff < 10000:
            print("Best match for {} is {} at ib={}".format(a[ia], b[best_at], best_at))
            best_diff = 10000
        ia += 1

它仍然以线性时间运行。时间复杂度大致为O（n+m），其中n是list a的长度，m是list b的长度，您很容易看到这种情况，因为在while循环的每次迭代中，ia或ib都会提前1。你知道吗

我认为，如果您想为列表a中的每个时间戳找到最接近的匹配项，那么您不可能比O（n+m）做得更好

相关问题更多 >

编程相关推荐

热门问题

热门文章