展开压缩的a和B串

2024-10-03 00:17:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个像这样的字符串a和B:“(BA)4B5A”,我希望输出是babba。但是我的代码只有在A后面有数字1时才有效,比如“(BA)4B5A1”。对于后面没有数字的字母,我只想重复一次。我想让它适用于A和B的任何字符串

def extensao(seq):

    new_seq = ""
    i = 0;
    while i < len(seq):
        if seq[i] == '(':
            it = i + 1
            exp = ""
            while seq[it]!= ')':
                exp += seq[it]
                it+=1
            it+=1
            num=""
            while it < len(seq) and seq[it].isdigit() == True:
                num += seq[it]
                it+=1
            x = 0
            while x < int(num):
                new_seq += exp
                x+=1
            i = it
        else:
            char = seq[i]
            it=i+1
            if(seq[it].isdigit()==True):
                num=""
                while it < len(seq) and seq[it].isdigit() == True:
                    num += seq[it]
                    it+=1
                x = 0
                while x < int(num):
                    new_seq += char
                    x+=1
                i = it
            else:
                new_seq+=char
                i+=1
    return new_seq



def main():

    seq = input("Escreva uma sequencia:")
    final_seq = extensao(seq)
    print(final_seq)

main()

Tags: 字符串truenewlenifdefit数字
3条回答

虽然Rob的答案可能是一个更好的方法,但你使用的算法基本上是正确的。另外,如果你不熟悉正则表达式,一开始它们可能有点让人不知所措。话虽如此,它们绝对值得学习,特别是如果你要做很多这样的任务。你知道吗

不过,既然你显然花了一点功夫来编写上面的算法,我觉得它值得结束-tbh它只是需要一点点调整。下面是您的代码的“固定”版本。你知道吗

如果您输入(BA)4B5A3之类的内容,它运行良好,但是您会遇到(BA)4B5A之类的问题。原因是,当你到达最后一个A时,你原来的算法试图检查下一个字符是否是数字。但是没有下一个字符,所以出现了一个错误,所以我添加了一个附加的if语句来检查这种可能性(如下面的注释所示)。你知道吗

另外,如果你在评估某件事是真是假,你应该说if condition is True:,甚至只是if condition:,而不是if condition == True:。所以我删除了所有的== True

def extensao(seq):
    new_seq = ""
    i = 0
    while i < len(seq):
        if seq[i] == '(':
            it = i + 1
            exp = ""
            while seq[it] != ')':
                exp += seq[it]
                it += 1
            it += 1
            num = ""
            while it < len(seq) and seq[it].isdigit():
                num += seq[it]
                it += 1
            x = 0
            while x < int(num):
                new_seq += exp
                x += 1
            i = it
        else:
            char = seq[i]
            it = i+1
            if it<len(seq):  #To check seq[i] isn't the final character
                if seq[it].isdigit(): #This is the line that was causing the error!!
                    num = ""
                    while it < len(seq) and isdigit(seq[it]):
                        num += seq[it]
                        it += 1
                    x = 0
                    while x < int(num):
                        new_seq += char
                        x += 1
                    i = it
                else:
                    new_seq += char
                    i += 1
            else:  #incase seq[i] was the final character
                new_seq += char
                i += 1
    return new_seq


print(extensao("(BA)4B5A"))

^{} third-party library有一个解决run length encoding/compression问题的工具。参见the docs中的示例:

    >>> compressed = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
    >>> list(run_length.decode(compressed))
    ['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd']

这个工具使用起来很简单。理想情况下,您应该传入一个元组列表的输入,每个元组包含一个字符串和一个乘法整数。你知道吗

代码

在这里,我们将实现一个parsehelper函数,将您的输入转换为适当的格式。你知道吗

import itertools as it

import more_itertools as mit


def parse(iterable):
    """Return a list of string, multiplier pairs."""
    iterable = iterable.replace("(", "").replace(")", "")
    pred = lambda x: x.isalpha()
    non_numbers = ("".join(g) for k, g in it.groupby(iterable, pred) if k)
    numbers = (int(list(g)[0]) for k, g in it.groupby(iterable, pred) if not k)
    zipped = list(it.zip_longest(non_numbers, numbers,  fillvalue=1))
    return zipped

演示

>>> iterable = "(BA)4B5A"

>>> # Application
>>> "".join(mit.run_length.decode(parse(iterable)))
'BABABABABBBBBA'

>>> # Tests
>>> assert parse(iterable) == [("BA", 4), ("B", 5), ("A", 1)]
>>> assert list(mit.run_length.decode(parse(iterable))) == ["BA", "BA", "BA", "BA", "B", "B", "B", "B", "B", "A"]

细节

parse函数从输入iterable中删除括号。然后用itertools.groupby构建两个生成器:一个用于字符串组,另一个用于乘法器组。这些组被压缩在一起。itertools.zip_longest接受fillvalue参数,因此如果输入iterable以字符串结尾(如在示例输入中),则默认乘数为1。你知道吗

run_length.decode方法在这里实现:

class run_length(object):
    ...
    def decode(iterable):
        return list(it.chain.from_iterable(it.repeat(k, n) for k, n in iterable))

注意:在命令行提示符中使用> pip install more_itertools来安装这个库。你知道吗

其他参考资料

您可以使用re.sub()函数,传递一个callable作为第二个参数:

import re

def extensao(seq):
    '"(BA)4B5A", and I want the output to be BABABABABBBBBA'
    return re.sub(r'(([AB])|\(([AB]*?)\))(\d+)',
                  lambda x: (x.group(2) or x.group(3))*int(x.group(4)), seq)

assert extensao("(BA)4B5A") == 'BABABABABBBBBA'

或者,同等地,也许更容易理解

import re

def extensao(seq):
    '"(BA)4B5A", and I want the output to be BABABABABBBBBA'
    def replacement(m):
        single_char = m.group(2)
        multi_char = m.group(3)
        count = int(m.group(4))
        char = single_char or multi_char
        return char * count
    pattern = '''
        (?x)    # Verbose
        (       # Grouping to detect single char or (multi char)
            (.) # Match single char and save it in $2
            |
            \((.*?)\) # Match (multi char), save inner bit in $3
        )
        (\d+)   # Save count in $4
    '''
    return re.sub(pattern, replacement, seq)

assert extensao("(BA)4B5A") == 'BABABABABBBBBA'

相关问题 更多 >