删除属于另一个单词的单词

2024-10-04 11:23:33 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个python字符串列表,比如-

lst = ['makeup brush tool', 'mak', 'flawless', 'tool', 'makeu', 'bru', 'brus', 'brush', 'makeup brush', 'cosmetic brush holder', 'elf makeup', 'key holder', 'holder', 'flaw', 'flawl', 'marinade brush', 'cosmetic', 'makeup brush cleaner', 'makeup brush holder', 'brush holder']

现在像“mak”这样的单字是另一个词“makeu”的一部分。如何删除像“mak”这样的词。 更多示例-在“bru”、“brus”和“brush”之外->;必须删除“bru”和“bru”

在此之后,我无法继续进行-

def remove_repeated_parts(un_corrected):
    """ Returns a corrected list """
    corrected = []
    for word in un_corrected:
        string_split = word.split()
        if len(string_split) == 1:
             # what to do from here


remove_repated_parts(lst)

预期产出-

lst = ['makeup brush tool', 'flawless', 'tool', 'makeu', 'brush', 'makeup brush', 'cosmetic brush holder', 'elf makeup', 'key holder', 'holder', 'marinade brush', 'cosmetic', 'makeup brush cleaner', 'makeup brush holder', 'brush holder']

注意,我们只考虑长度为1的字符串

这和正则表达式有关吗


Tags: 字符串toolsplitlstelfholderflawlessbrush
3条回答

您可以尝试以下方法: (不使用正则表达式)

lst = ['makeup brush tool', 'mak', 'flawless', 'tool', 'makeu', 'bru', 'brus', 'brush', 'makeup brush', 'cosmetic brush holder', 'elf makeup', 'key holder', 'holder', 'flaw', 'flawl', 'marinade brush', 'cosmetic', 'makeup brush cleaner', 'makeup brush holder', 'brush holder']
def check_list(lists,l=[]):
    for m in lists:
        for n in lists:
            if m in n:
                l.append(n)
                break
    return l
print(check_list(lst))

不带正则表达式的解决方案:

lst = ['makeup brush tool', 'mak', 'flawless', 'tool', 'makeu', 'bru', 'brus', 'brush', 'makeup brush', 'cosmetic brush holder', 'elf makeup', 'key holder', 'holder', 'flaw', 'flawl', 'marinade brush', 'cosmetic', 'makeup brush cleaner', 'makeup brush holder', 'brush holder']

testword in sorted(lst):
    for word in lst:
        if testword !=word and testword in word:
            try:
                lst.remove(testword)
            except ValueError:
                pass
print(lst)

逻辑:

  1. 首先对列表进行排序,短字符串更有可能被删除
  2. 用每一个单词循环
  3. 如果我们测试的单词(“testword”)是任何其他字符串的一部分,而不是它本身,那么将其从列表中删除

您可以使用列表理解,如下所示:

new_lst = [x for x in lst if not any(x in y for y in lst if (y != x and len(y.split())==1))]
print(new_lst)

输出:

['makeup brush tool', 'flawless', 'tool', 'makeu', 'brush', 'makeup brush', 'cosmetic brush holder', 'elf makeup', 'key holder', 'holder', 'marinade brush', 'cosmetic', 'makeup brush cleaner', 'makeup brush holder', 'brush holder']

相关问题 更多 >