提高数据比较性能

self.adverts = set() # Around 11k rows self.old_adverts= set() # Around 11k rows self.advs = [] ... # Find modified items for item in self.new_items: for old_item in self.old_items: if item.id == old_item.id and item.price != old_item.price: self.advs.append( { 'delete': old_item, 'new': item, 'archive': old_item } )

class Item(Base): ... id = Column(String(25), nullable=False, primary_key=True) price = Column(Numeric(precision=8), nullable=False, primary_key=True) # Another multiple additional fields ... def __eq__(self, other): return self.id == other.id def __hash__(self): return hash(self.id)

# for item in self.items: # if item not in self.old_items: # self.insert_items_db.add({'new': item}) # Find absolutely new items for new_item in self.items- self.old_items: self.advs.append({'new': new_item})

2条回答

网友

1楼 · 编辑于 2024-09-28 01:23:26

我没有完全按照你的代码，但你可以通过使用字典来加速比较两个列表。这是O（n）而不是O（n^2），因为检查是否存在从O（n）减少到O（1）

例如。假设你有一堆带有变量id，value，color的对象

for x in list1:       #N operations
    for y in list2:   #N operations
        if x.id == y.id:  #O(1)
            #do stuff

相反，你可以这样做：

#create two dictionaries where each key is the ID and each value is the
#object, data, other things etc.
dict1 = { x.id:x for x in list1}   
dict2 = { y.id:y for y in list2}

您的代码现在变成：

for x in dict1.keys():     #O(N)
    if x in dict2:         #O(1)
         #Do some stuff

现在是O（n）时间

现在如果你想比较一下价格，那就很棘手了。如果我们有多个Id元素（例如，在同一个集合中存在冲突），那么我们可以将字典中的每个条目转换为对象列表。这在理论上仍然是O（N^2）操作，但它比遍历所有11k元素有很大的改进

假设没有重复的ID。然后代码变为：

for x in dict1.keys():     #O(N)
    if x in dict2:         #O(1)
        if dict1[x].price != dict2[x].price:  #or any other comparison
             #do stuff

如果存在重复的ID，则字典结构应如下所示：

my_dict = {\
    1001: [ obj1, obj2, obj3]\  #where obj1.id == obj2.id == obj3.id
    1002: [obj4, obj5, obj6]\   #where obj4.id == obj5.id == obj6.id
    }

对代码进行调整以反映以下内容

for x in dict1.keys():     
    if x in dict2:   
        if x in dict2:
            for my_object_type in dict2[x]:     #something about this seems familiar.....
                if x.other_identifier == my_object_type.other_identifer:
                #finally do some stuff!

这是最疯狂的部分

在上面的代码中，我添加了另一个for循环。这又是O（N）速度，这就是为什么代码又减少到O（N^2）。但是，如果我们有另一个标识符，比如“Id2”或“color\u of \u left\u toe”，那么我们就可以创建另一个字典了

在这一点上，这个结构将演变成一个对象的字典字典。相当复杂，但是！！访问时间可以保持O（1）

为什么“在dict中”更快

在第一个代码示例中，您遍历第一个列表，然后再次遍历另一个列表

因此，对于list1中的第一个元素，您遍历len（list2），或者N

因为对X中的每一个元素循环这个循环，所以做这个N次

N+N+N+N………N

\~~~~~~~N次~~~~~/

或O（N^2）

为什么dict更快

字典对每个元素进行散列，然后基于此散列存储它。这意味着您不必通过复杂的二叉树或数组来查找所要查找的内容。相反，你做了一点O（1）时间的数学，你有点你需要检查的基础上，你给它的关键马上

网友
2楼 · 编辑于 2024-09-28 01:23:26

这在很大程度上取决于你的“做某事”意味着什么。如果这是一个简单的记录更新，那么忘记这个set实现，去查字典。使用旧数据创建旧字典，键入产品ID。然后用新数据更新它
catalog = {self.id: [ <remainder of the row> ] for self in old_data} catalog.update( {self.id: [ <remainder of the row> ] for self in new_data} )

为什么“在dict中”更快

相关问题更多 >

编程相关推荐

热门问题

热门文章