将多个小字符串连接成一个长字符串

2024-09-29 19:18:20 发布

您现在位置:Python中文网/ 问答频道 /正文

例如,我有三个字符串:

manta
alcats
random

我需要检查,是否可以使用第一个和最后一个字符将这些字符串连接到一个长字符串而且我不能使用任何库。

在我的示例中,它将返回True,因为它的连接方式如下:

randoMantAlcats

我尝试了回溯,它适用于小数据集,但对于最大的数据集,它将花费数小时

所以我在寻找好的快速算法


Tags: 数据字符串算法true示例方式random字符
1条回答
网友
1楼 · 发布于 2024-09-29 19:18:20

这是一个有趣的问题,花费的时间比我预期的要长,但我想我有办法解决。我的代码接收一个单词列表,并且:(1)如果没有解决方案,则返回False;或者(2)如果有解决方案,则返回一个单词序列,其中前一个单词的最后一个字符与最后一个单词的第一个字符相同

算法:

  • 首先,我列出所有的前几个字符和最后几个字母。如果这两个列表中有多个字母不匹配,则没有解决方案
  • 因为所有重要的是第一个和最后一个字符,而这两者之间的一切都不重要,所以我制作了一个字典,其中键是第一个字符,值是对应于相应的第一个字符的最后一个字符的列表。我认为字典可以提高效率,因为所有的中间字符都无关紧要,例如,就算法而言,“sit”与“salt”相同,与“touction”相同。所以当你有一本字典时,如果“sit”不起作用,你可以跳过测试所有其他的“t”字
  • 在列表上运行回溯算法。如果在第一个字符列表和最后一个字符列表之间有一个不匹配的字符,我从不匹配的第一个字符开始;如果两个列表之间没有不匹配的字符,我将从任何第一个字符开始

以下是我认为有效的代码:

import string
import random
import timeit

def list_to_nodes(string_list):
    # convert list of strings into a dictionary where the keys corresponds to the first letters and the values corresponds to the last letters
    my_dict = {}
    for word in string_list:
        if word[0] in my_dict:
            my_dict[word[0]].append(word[-1])
        else:
            my_dict[word[0]] = [word[-1]]

    for value in my_dict.values():
        value.sort()

    return my_dict

def get_mismatches(string_list):
    # make two equal sized lists containing the first letters and last letters
    # if there are more than two letters which are different, this would not work
    start = []
    end = []

    for i in range(0, len(string_list)):
        start.append(string_list[i][0])
        end.append(string_list[i][-1])

    for char in start.copy():
        if char in end:
            end.remove(char)
            start.remove(char)

    return [start, end]

def next_iter(current_path, nodes, next_update_position):
    # current_path cannot be empty
    if next_update_position == len(current_path):
        # what to do if we want to update the next word
        next_char = current_path[-1][-1]
        # print(f'next char is {next_char}')
        if next_char in nodes and len(nodes[next_char]) != 0:
            current_path.append(next_char + nodes[next_char][0])
            nodes[next_char].pop(0) # remove this char so that it cannot be used again
            return [current_path, nodes, next_update_position+1]
        return [current_path, nodes, next_update_position-1] #dead end: now try to replace the last word
    else:
        if next_update_position != len(current_path) - 1:
            print('Something wrong here') #update should be either for a new element or the last element
            return False
        # what to do if we want to replace last word
        starting_char = current_path[-1][0]
        prev_try = current_path[-1][-1]
        nodes[starting_char].append(prev_try) # put the previous try back into nodes
        nodes[starting_char].sort()
        current_path.pop() # remove last item
        for item in nodes[starting_char]:
            if item > prev_try: # assumes everything is alphabetically sorted
                current_path.append(starting_char+item)
                nodes[starting_char].remove(item)
                return [current_path, nodes, next_update_position+1]
        # No more replacement options for 
        if next_update_position == 0:
            return False
        else:
            return [current_path, nodes,next_update_position-1]

def find_path(string_list):
    nodes = list_to_nodes(string_list)
    start, end = get_mismatches(string_list)

    if len(start) > 1:
        print('More than one mismatched letters. Does not work!')
        return False
    if len(start) == 1:
        start_char = start[0]
    else:
        start_char = string_list[0][0] #start with first (ie random) starting character

    path = [start_char + nodes[start_char][0]] #initialize path
    nodes[start_char].pop(0) #remove char
    position = 1 # 0th element already filled. now start with first

    while True:
        result = next_iter(path, nodes, position)
        if result == False:
            print("There's no solution!!!")
            return False
        else:
            path, nodes, position = result
            # print(f'path is {path} and position is {position}')
            if position == len(string_list):
                restored = []
                temp_list = string_list.copy()
                for item in path:
                    current = next(word for word in temp_list if word.startswith(item[0]) and word.endswith(item[1]))
                    restored.append(current)
                    temp_list.remove(current)

                return restored

我有以下帮助器方法来帮助生成一些列表和测试代码:

def gen_random_words(num, len):
    # generate a list of random strings where num is number of strings and len is length of each string
    # you may or may not be able to find a workable path in the list (as it's totally random)
    result = []
    for _ in range(num):
        temp = ''
        for _ in range(len):
            temp += random.choice(string.ascii_lowercase)
        result.append(temp)
    return result

def gen_sequential_words(num, len):
    # generate a list of random strings where num is number of strings and len is length of each string
    # you will be able to find a workable path
    result = []
    last = random.choice(string.ascii_lowercase)
    for _ in range(num):
        temp = last
        for _ in range(len-1):
            temp += random.choice(string.ascii_lowercase)
        last = temp[-1]
        result.append(temp)
    random.shuffle(result)
    return result

def test_sequence(string_list):
    # test if a list of string are in sequential order (ie last char of one word matches first char of next word)
    # purpose of this method is to double check the output from find_path() is indeed correct
    for i in range(0, len(string_list)-1):
        if string_list[i][-1] != string_list[i+1][0]:
            return False
    return True

以下是测试:

# test 1: generate 10000 random words. should not work
test1 = gen_random_words(10000,4)
find_path(words)

还有一个:

# test 2: generate 10000 random words. should work
test2 = gen_sequential_words(10000,4)
answer = find_path(test2)
print(test_sequence(answer))

要测试速度,请执行以下操作:

%%timeit
find_path(test2)
2.93 s ± 64.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

欢迎思想

相关问题 更多 >

    热门问题