如何用从最短到最长的匹配从头到尾替换文本

In [1807]: text='110011111000011010110011'

s={'11': '++--', '01': '-+-+-', '000': '+-++++++', '001': '--', '100': '--+-', '101': '----++-+--'} peekstart=0 peekend=1 sptr='110011111000011010110011' bytestr=b'' while peekstart+peekend <= sptrlen: match=False while match == False: peek = sptr[peekstart:peekstart+peekend] findptr = s.get(peek) while findptr == None: peek = sptr[peekstart:peekstart+peekend+stream] stream=stream+1 findptr = s.get(peek) if stream == 0: peekstart=peekstart+peekend+stream else: peekstart=peekstart+peekend+stream-1 stream=0 bytestr+=s.get(peek).encode() match = True print(bytestr) b'++----++--++--+-++++++-+-+-----++-+---+-+---+-++--'

from functools import reduce s={'11': '++--', '01': '-+-+-', '000': '+-++++++', '001': '--', '100': '--+-', '101': '----++-+--'} def decompressHuffmanCode(a, bit): return ('', a[1] + a[2][a[0]+bit[0]], a[2]) if (a[0]+bit[0] in a[2]) else (a[0]+bit[0], a[1], a[2]) # Added one to the front to deal with cases of the binary starting with zeroes compressed_binary='1110011111000011010110011' read_compressed_binary = compressed_binary[1:] remainder,bytestr,s = reduce(decompressHuffmanCode, read_compressed_binary, ('', '', s)) print(bytestr)

3条回答

网友

1楼 · 编辑于 2024-07-02 13:24:58

我可以想出两种方法来做到这一点，但我不确定备选方案1在从左到右的意义上是否正确。最好写几个测试来检查它的行为是否正确。你知道吗

备选方案1：使用regex

（现在你有两个问题？：P）

import re
def transform(input_string, substitutions):

    pattern = "|".join(sorted(substitutions.keys(), key = lambda x: len(x)))
    def replacement(match):
        return substitutions[match.group(0)]

    subs = 1
    while subs > 0:
        input_string, subs = re.subn(pattern, replacement, input_string)
    return input_string


s = {
    '11': '++--',
    '01': '-+-+-', 
    '000': '+-++++++', 
    '001': '--', 
    '100': '--+-', 
    '101': '----++-+--'
}

print(transform('110011111000011010110011', s))

它将所有按长度排序的键连接到一个or（|），因此它应该首先替换最短的键。然后它就会替换，直到找到更多为止。它适用于你的案件，但我不确定它如何处理更复杂的案件。你知道吗

备选方案2：使用函数工具.reduce你知道吗

from functools import reduce

s = {
    '11': '++--',
    '01': '-+-+-', 
    '000': '+-++++++', 
    '001': '--', 
    '100': '--+-', 
    '101': '----++-+--'
}

def partial_transform(accumulator,bit):

    blob, result = accumulator
    next_blob = blob + bit

    if next_blob in s.keys():
        return ('', result + s[next_blob])

    return (next_blob, result)


print(reduce(partial_transform, '110011111000011010110011', ('','')))

这种方法类似于其他答案，因为它从左边“吃”，直到找到匹配项，然后将其添加到结果中，直到输入结束。你知道吗

网友

2楼 · 编辑于 2024-07-02 13:24:58

这是@Artogs's "alternative 2"reduce方法，但有一点代码

def worker(aggregate, bit):
    prefix, out = aggregate
    key = prefix + bit
    return ('', out + s[key]) if (key in s) else (key, out)

remainder,string = reduce(worker, '110011111000011010110011', ('', ''))
print string

网友

3楼 · 编辑于 2024-07-02 13:24:58

这是一个常见的吃功能。你从左边吃东西，直到找到字典里匹配的东西。你知道吗

text='110011111000011010110011'
s={'11': '++--', '01': '-+-+-', '000': '+-++++++', '001': '--', '100': '--+-', '101': '----++-+--'}

string = ''
i = 0
for j in range(len(text) + 1):
    symbol = s.get(text[i:j])
    if symbol is not None:
        string += symbol
        i = j

print(string)

Out: ++----++--++--+-++++++-+-+-----++-+---+-+---+-++--

备选方案1：使用regex

备选方案2：使用函数工具.reduce你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章