将具有一定长度的公共子字符串的字符串中的单词组合在一起

2024-06-28 20:44:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在字符串中显示具有公共子字符串的单词。 例如,如果给定的字符串是

str = "the games are lame"

而且单词必须根据长度为3的公共子字符串组合在一起,因此输出应该是

the 
games, lame 
are

因为长度为3的公共子串是“ame”

我继续使用split()将字符串转换为list say“lista”,并使另一个list say“listb”,其中包含长度为3的所有可能的子字符串,如

the, gam, gme, ges, ame, aes, mes, are, lam, lme, ame

然后我检查了“listb”中的重复项('ame'),并在此基础上与“lista”中的项进行了比较

for items in duplicate:
       for item in lista:
           if items in item and not in listc:
               listc.append(item)

现在,我有了一个“listc”,其中的项具有长度为3的公共子字符串,但我不知道如何在输出中根据需要对它们进行分组。此外,如果“str”包含更多具有公共子字符串的单词,“listc”也将包含这些公共单词。 我不知道我是否应该以这种方式进行,并且似乎不知道如何根据输出中的需要对“listc”中的项进行分组


Tags: the字符串initem单词aregameslist
2条回答

这里有一个解决方案

str_ = "the games are lame"

# first I get a list of all the words
words = str_.split()
# words >>> ['the', 'games', 'are', 'lame']

groups = []
# This variable will contain the list of words

# For each words
for word in words:
    found = False

    # Get the first words of each groups
    other_words = [x[0] for x in groups if x != word]

    # Loop through the word and get all substring of 3 characters
    for i in range(len(word)):
        substring = word[i:i+3]

        # Eliminates the substring that doesn't have the correct length
        if len(substring) != 3:
            continue

        try:
            # try to find the substring in a group and get the corresponding index of that group
            index = [substring in other_word for other_word in other_words].index(True)
            found = True

            # Add the word in the group
            groups[index].append(word)
        except ValueError:
            continue

    # If we don't find a group for the word, we create a new group with that word in it
    if not found:
        groups.append([word])


# groups >>> [['the'], ['games', 'lame'], ['are']]

# Now print the groups
for group in groups:
    print(", ".join(group))

输出:

the
games, lame
are

我认为你在那里创建了很多列表,这可能会让人很困惑

如果您想使用纯逻辑方法,而不使用为序列匹配设计的库,例如difflib,您可以首先定义一个比较两个字符串的函数;然后你把你的句子分成一个单词列表,并通过这个列表进行双重迭代(嵌套),比较所有可能的单词对

如果字符串匹配,它们将打印在同一行上,以逗号分隔,否则打印在新行上

在以下函数中,我还为要匹配的子字符串的长度添加了一个参数,默认情况下设置为3以与您的问题保持一致:

# This function compairs two strings and returns them in a tuple if they contain the 
# same substring of len_substring characters.

def string_matcher(string_a, string_b, len_substring = 3):
    for i in range(len(string_a)-len_substring):
        if string_a[i:i+len_substring] in string_b:
            return string_a, string_b
    return None

string = "the games are lame"
words = string.split()

output = ""

# Making a double iteration over the words list and calling string_matcher for each pair.
for i in range(len(words)-1):
    output = output+words[i]
    for j in range(i+1, len(words)):
        try:
            word_a, word_b = string_matcher(words[i], words[j])
            output = output+", "+word_b
        except TypeError:
            pass
    output = output+"\n"

print(output)

程序将打印出:

the
games, lame
are

相关问题 更多 >