逐行重新编号

2024-10-03 21:24:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的输入文本:

word77 text text bla66 word78 text bla67
text bla68 word79 text bla69 word80 text
bla77 word81 text bla78 word92 text bla79 word99

我必须在每一行中从1开始重新编号wordbla

我可以将整个输入重新编号,如下所示:

word1 text text bla1 word2 text bla2
text bla3 word3 text bla4 word4 text
bla5 word5 text bla6 word6 text bla7 word7

上述各项的代码:

import re
def replace(m): global i; i+=1; return str(i);
fp = open('input.txt', 'r').read()
i = 0
fp = re.sub(r'(?<=word)(\d+)', replace, fp)
i = 0
fp = re.sub(r'(?<=bla)(\d+)', replace, fp)
#open('sample.txt', 'wb').write(fp)
print fp

理想情况下,结果应如下所示:

word1 text text bla1 word2 text bla2
text bla1 word1 text bla2 word2 text
bla1 word2 text bla3 word3 text bla4 word4

Tags: textrereplace编号wordfpblaword1
2条回答

我们可以创建一个更通用的函数,对任意数量的单词重新编号。每个要替换的单词都有自己的关联计数器,我们使用re.sub一次性完成所有操作:

import re
from itertools import count


data = """word77 text text bla66 word78 text bla67
text bla68 word79 text bla69 word80 text
bla77 word81 text bla78 word92 text bla79 word99"""

words_to_renumber = ['word', 'bla']

def renumber(words_to_renumber, data):
    counters = {word:count(1) for word in words_to_renumber}
    def replace(match):
        word = match.group(1)
        return word + str(next(counters[word]))

    rep_re = re.compile('(' + '|'.join(words_to_renumber) + ')' + '\d+')
    out = rep_re.sub(replace, data)
    return out

print(renumber(words_to_renumber, data))

输出:

word1 text text bla1 word2 text bla2
text bla3 word3 text bla4 word4 text
bla5 word5 text bla6 word6 text bla7 word7

您可以一次对整个文件进行操作(fp.read())-您需要按行操作:

with open("input.txt","w") as f:
    f.write("""word77 text text bla66 word78 text bla67
text bla68 word79 text bla69 word80 text
bla77 word81 text bla78 word92 text bla79 word99""")

import re

i = 0

def replace(m): 
    global i 
    i+=1
    return str(i)

with open('input.txt') as fp, open("output.txt","w") as out:
    # read only one line of the file and apply the transformations
    for line in fp:
        i = 0
        l = re.sub(r'(?<=word)(\d+)', replace, line)
        i = 0
        l = re.sub(r'(?<=bla)(\d+)', replace, l)
        out.write(l)

with open("output.txt") as f:
    print(f.read())

输出:

word1 text text bla1 word2 text bla2
text bla1 word1 text bla2 word2 text
bla1 word1 text bla2 word2 text bla3 word3

相关问题 更多 >