比较Python目录中的值

2024-06-28 10:49:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个以数字为键的列表和以字符串为值的列表。例如

my_dict = {
    1: ['bush', 'barck obama', 'general motors corporation'],
    2: ['george bush', 'obama'],
    3: ['general motors', 'george w. bush']
}

我想要的是比较每个列表中的每个项目(对于每个键),如果该项目是另一个项目的子字符串,则将其更改为更长的项目。所以,一种非常肮脏的共指消解。你知道吗

我不知道该怎么做。 下面是我想到的伪代码:

for key, value in dict:
    for item in value:
        if item is substring of other item in any other key, value:
            item = other item

所以我的字典最终会变成这样:

my_dict = {
    1: ['george w. bush', 'barck obama', 'general motors corporation'],
    2: ['george w. bush', 'barck obama'],
    3: ['general motors corporation', 'george w. bush']
}

对不起,如果我没有表达清楚问题是什么。你知道吗


Tags: 项目字符串in列表valueitemdictgeneral
2条回答

在你的字典里创建一组所有的名字。
然后您可以创建一个允许您构造新dict的查找表。
这将使用key=len中的max()来选择具有子字符串的最长名称:

>>> s = {n for v in my_dict.values() for n in v}
>>> lookup = {n: max((a for a in s if n in a), key=len) for n in s}
>>> {k: [lookup[n] for n in v] for k, v in my_dict.items()}
{1: ['george w. bush', 'barck obama', 'general motors corporation'],
 2: ['george bush', 'barck obama'],
 3: ['general motors corporation', 'george w. bush']}

或者您也可以就地执行max()

>>> s = {n for v in my_dict.values() for n in v}
>>> {k: [max((a for a in s if n in a), key=len) for n in v] for k, v in my_dict.items()}
{1: ['george w. bush', 'barck obama', 'general motors corporation'],
 2: ['george bush', 'barck obama'],
 3: ['general motors corporation', 'george w. bush']}

要获得所需的输出,需要与子字符串稍有不同的匹配条件:

>>> s = {n for v in my_dict.values() for n in v}
>>> {k: [max((a for a in s if all(w in a for w in n.split())), key=len) for n in v] for k, v in my_dict.items()}
{1: ['george w. bush', 'barck obama', 'general motors corporation'],
 2: ['george w. bush', 'barck obama'],
 3: ['general motors corporation', 'george w. bush']}

事实上,这是一本列表词典是无关紧要的。有些字符串必须根据其他字符串进行修改。你知道吗

以下是字符串:

all_strings = [s for string_list in my_dict.values() for s in string_list]

替换字符串:

def expand_string(s, all_strings):
    # compare words
    matches = [s2 for s2 in all_strings
               if all(word in s2.split() for word in s.split())]
    if matches:
        # find longest result
        return sorted(matches, key=len, reverse=True)[0]
    else:
        # this wont't really happen, but anyway
        return s

要替换所有内容:

result = {k: [expand_string(s, all_strings) for s in v]
          for k, v in my_dict.items()}

相关问题 更多 >