从包含给定最大字数的字符串列表创建子列表

2024-09-27 21:28:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串列表。我希望从列表中创建子列表,使其包含原始列表中的字符串,但每个子列表中的字数应小于16,并且前面的字符串应为子列表的第一个元素,但第一个子列表除外

举个例子,假设我的列表如下,包含5个字符串,每个字符串包含不同数量的单词

qq = ['blended e learning forumin planning', 'difficulties of learning as forigen language', 'difficulties of grammar', 'students difficulties in grammar', 'difficulties of english grammar']

我想创建满足上述条件的子列表,最多16个单词,每个列表包含前面的字符串作为第一个元素(第一个子列表除外)。只有两个子列表,我的输出将是

q1 = ['blended e learning forumin planning', 'difficulties of learning as forigen language', 'difficulties of grammar']
q2 = ['difficulties of grammar', 'students difficulties in grammar', 'difficulties of english grammar']

这就是我尝试过的。这是正确的吗?有没有更好的方法?我有数以百万计的名单来做这个手术

qq = ['blended e learning forumin planning', 'difficulties of learning as forigen language', 'difficulties of grammar', 'students difficulties in grammar', 'difficulties of english grammar']

psz = 0
pi = 0
msz = 16
subqq = list()
qq_i = list()

for i in range(len(qq)):
    csz=psz+len(qq[i].split())
    if (csz>msz):
        subqq.append(qq_i.copy())
        qq_i.clear()
        qq_i.append(qq[i-1])
        qq_i.append(qq[i])
        psz = 0
    else:
        qq_i.append(qq[i])
        psz += len(qq[i].split())

subqq.append(qq_i)

Tags: of字符串in列表asqqlearningplanning
3条回答

由于其他人已经提供了有效的解决方案,下面是另一个有趣的方法,它是通过在每个qq元素和相应的累积字数之间的映射方案实现的

首先,创建映射dict:

qq_map = {q: len(" ".join(qq[:n+1]).split()) for n, q in enumerate(qq)}

# {'blended e learning forumin planning': 5, 'difficulties of learning as forigen language': 11,
# 'difficulties of grammar': 14, 'students difficulties in grammar': 18,
# 'difficulties of english grammar': 22}

然后使用映射信息构建分组列表:

qq = [[q for q in qq if qq_map[q] in range(i*16, (i+1)*16)] /
     for i in range(-(-qq_map[qq[-1]] // 16))]

# [['blended e learning forumin planning', 'difficulties of learning as forigen language', 'difficulties of grammar'], 
# ['students difficulties in grammar', 'difficulties of english grammar']]

Note: -(-qq_map[qq[-1]] // 16) is an equivalent to math.ceil(qq[-1] / 16). You can replace it if you'd like a more concise and less 'arithmetic' expression.

最后,再次处理列表,以便将每个组的最后一个字符串推送到下一个组中(当然第一个字符串除外):

qq = [[qq[i-1][-1]] + qq[i] if i != 0 else qq[i] for i in range(len(qq))]

# [['blended e learning forumin planning', 'difficulties of learning as forigen language', 'difficulties of grammar'], 
# ['difficulties of grammar', 'students difficulties in grammar', 'difficulties of english grammar']]

这就是我想到的。这是一个类似于你的算法和SorousH Bakhtiary的答案,但应该没有字数错误,我认为它更容易阅读

如果我们使用前一个子列表中的最后一个短语开始新的子列表,并且在不突破单词限制的情况下无法添加下一个短语,那么这也会引发一个错误。如果有两个连续的短语具有>;8个字-如果你能确定这永远不会发生,那么你可以省略这一部分

def count_words(phrase):
    return len(phrase.split())


def sublists_with_max_words(main_list, max_words=16):
    output_sublists = []

    current_sublist = []
    current_sublist_words = 0

    for phrase in main_list:
        words_in_phrase = count_words(phrase)

        if (current_sublist_words + words_in_phrase) > max_words:
            # If we cannot add the phrase to the sublist without breaking
            # the word limit, then add the sublist to the output
            output_sublists.append(current_sublist)

            # Start a new sublist with the last phrase we added
            last_phrase = current_sublist[-1]
            current_sublist = [last_phrase]
            current_sublist_words = count_words(last_phrase)

            # If we cannot add the phrase to the new sublist either, then raise
            # an exception as we cannot continue without breaking the word limit
            if (current_sublist_words + words_in_phrase) > max_words:
                raise ValueError(
                    f"Cannot add '{phrase}' ({words_in_phrase} words) to a new"
                    f" sublist with {current_sublist_words} words"
                )

        # Add the current phrase to the sublist
        current_sublist.append(phrase)
        current_sublist_words += words_in_phrase

    # At the end of the loop, add the working sublist to the output
    output_sublists.append(current_sublist)

    return output_sublists


print(sublists_with_max_words(qq))

我的与你的相似,算法基本相同,但我认为这应该运行得稍微快一点:

def fn(lst, n):
    word_count = 0
    res = []
    temp_lst = []

    for item in lst:
        len_current_item = len(item.split())
        word_count += len_current_item

        if word_count < n:
            temp_lst.append(item)

        else:
            res.append(temp_lst)
            last_item = res[-1][-1]
            temp_lst = [last_item, item]
            word_count = len_current_item + len(last_item.split())

    res.append(temp_lst)

    # Checking for last item's lenght as Phydeaux pointed out in comments.
    if word_count > n:
        res.append([temp_lst.pop()])

    return res

输出:

['blended e learning forumin planning', 'difficulties of learning as forigen language', 'difficulties of grammar']
['difficulties of grammar', 'students difficulties in grammar', 'difficulties of english grammar']

我尽量避免复制和清除,以及一些小的改动

相关问题 更多 >

    热门问题