str.join（iterable）方法是如何在Python/线性时间字符串连接中实现的

1条回答

网友

1楼 · 发布于 2024-09-30 10:33:11

将str作为实际的str连接是一种转移注意力的做法，而且not what Python itself does：Python操作可变的bytes，而不是str，这也消除了对know string internals的需要。具体来说，str.join将其参数转换为字节，然后pre-allocates和mutates its result

这直接对应于：

将str参数编码/解码到bytes或从bytes进行编码/解码的包装器
对元素和分隔符的len求和
分配可变的bytesarray来构造结果
将每个元素/分隔符直接复制到结果中

# helper to convert to/from joinable bytes
def str_join(sep: "str", elements: "list[str]") -> "str":
    joined_bytes = bytes_join(
        sep.encode(),
        [elem.encode() for elem in elements],
    )
    return joined_bytes.decode()

# actual joining at bytes level
def bytes_join(sep: "bytes", elements: "list[bytes]") -> "bytes":
    # create a mutable buffer that is long enough to hold the result
    total_length = sum(len(elem) for elem in elements)
    total_length += (len(elements) - 1) * len(sep)
    result = bytearray(total_length)
    # copy all characters from the inputs to the result
    insert_idx = 0
    for elem in elements:
        result[insert_idx:insert_idx+len(elem)] = elem
        insert_idx += len(elem)
        if insert_idx < total_length:
            result[insert_idx:insert_idx+len(sep)] = sep
            insert_idx += len(sep)
    return bytes(result)

print(str_join(" ", ["Hello", "World!"]))

值得注意的是，虽然元素迭代和元素复制基本上是两个嵌套循环，但它们在不同的对象上进行迭代。该算法仍然只接触每个字符/字节三次/一次

相关问题更多 >

编程相关推荐

热门问题

热门文章

str.join（iterable）方法是如何在Python/线性时间字符串连接中实现的

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >