字典之间的python传递性

original1=[['email', 'tel', 'fecha', 'descripcion', 'categ'], ['a@gmail.com', '1', '2014-08-06 00:00:06', 'MySpace a', 'animales'], ['b@gmail.com', '1', '2014-08-01 00:00:06', 'My Space a', 'ropa'], ['a@gmail.com', '2', '2014-08-06 00:00:06', 'My Space b', 'electronica'], ['b@gmail.com', '3', '2014-08-10 00:00:06', 'Myace c', 'animales'], ['c@gmail.com', '4', '2014-08-10 00:00:06', 'Myace c', 'animales']]

from collections import defaultdict email_to_indices = defaultdict(list) phone_to_indices = defaultdict(list) for idx, row in enumerate(datos): email = row[0].lower() phone = row[1] email_to_indices[email].append(idx) phone_to_indices[phone].append(idx)

2条回答

网友

1楼 · 编辑于 2024-10-02 22:34:42

这是另一种方法：

在构建email_to_indices字典时，可以将该行的电话号码存储为值，然后让phone_to_indices拥有该行的索引。这样我们就可以创建一个email_to_indices到{}到行映射的索引。在

通过修改和基本的设置操作，我可以得到您想要的东西：

from collections import defaultdict

email_to_indices = defaultdict(list)
phone_to_indices = defaultdict(list)
combined = defaultdict(set)

original=[['email', 'tel', 'fecha', 'descripcion', 'categ'],
          ['a@gmail.com', '1', '2014-08-06 00:00:06', 'MySpace a', 'animales'],
          ['b@gmail.com', '1', '2014-08-01 00:00:06', 'My Space a', 'ropa'],
          ['a@gmail.com', '2', '2014-08-06 00:00:06', 'My Space b', 'electronica'],
          ['b@gmail.com', '3', '2014-08-10 00:00:06', 'Myace c', 'animales'],
          ['c@gmail.com', '4', '2014-08-10 00:00:06', 'Myace c', 'animales']]


for idx, row in enumerate(original[1:], start=1):
    email = row[0].lower()
    phone = row[1]
    email_to_indices[email].append(phone) # Here is what I changed
    phone_to_indices[phone].append(idx)

random_key = 0
for idx, row in enumerate(original[1:], start=1):
    grouped_rows = []
    if row[0].lower() in email_to_indices:
        for phone_no in email_to_indices[row[0].lower()]:
            grouped_rows.extend(phone_to_indices[phone_no])

    if len(combined[random_key]) > 0 and len(set(grouped_rows).intersection(combined[random_key])) > 0:
        combined[random_key].update(set(grouped_rows))
    elif len(combined[random_key]) > 0:
        random_key += 1
        combined[random_key].update(set(grouped_rows))
    else:
        combined[random_key].update(set(grouped_rows))

print combined

这样可以得到：

^{pr2}$

网友

2楼 · 编辑于 2024-10-02 22:34:42

这里有一个图形，或者更精确地说是Bipartite graph。节点有两种类型：电子邮件和电话。如果存在与该电子邮件和电话有关的记录，则会连接两个节点。或者我们甚至可以说，记录本身就是连接两个节点的边。在

任务是找到这个图的Connected components。通过以下链接，你可以找到可以在线性时间内完成的算法。在

当然，也可以发明一些快速而肮脏的解决方案，如果数据集足够小，甚至可能被认为是合适的。在

您可以在这里找到一些Python实现：Python connected components

更新：下面是一个如何构造图形的示例：

graph = {};
EMAIL = "email";
PHONE = "phone";

for rec in datos:
    graph.setdefault((EMAIL, rec[0]), set()).add((PHONE, rec[1]));
    graph.setdefault((PHONE, rec[1]), set()).add((EMAIL, rec[0]));

print "\n".join("%s: %s" % (str(node), str(linkedNodes)) for (node, linkedNodes) in graph.iteritems());

所以每个节点都有一个类型（EMAIL或PHONE，它们实际上可以是整数，例如0和1，我将它们设为字符串只是为了方便打印）和一个值。Graph是一个字典，节点作为键，连接的节点集作为值。在

相关问题更多 >

编程相关推荐

热门问题

热门文章