大写RegEx时保留段落标记

2024-06-01 06:42:36 发布

您现在位置:Python中文网/ 问答频道 /正文

p = re.compile(r'((?<=[\.\?!]\s)(\w+)|(^\w+))')
def cap(match):
    return(match.group().capitalize())
capitalized_1 = p.sub(cap, Inputfile)

with codecs.open('o.txt', mode="w", encoding="utf_8") as file:
  file.write(capitalized_1)

我用正则表达式把后面的字母大写!上面的代码就是这么做的。但它去掉了段落标记(分页符pilcrow)并将其压缩成一个大段落

如何保留段落标记并防止结块

输入文件:

on the insert tab, the galleries include items that are designed to coordinate with the overall look of your document. you can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. when you create pictures, charts, or diagrams, they also coordinate with your current document look.

you can easily change the formatting of selected text in the document text by choosing a look for the selected text from the quick styles gallery on the home tab. you can also format text directly by using the other controls on the home tab. most controls offer a choice of using the look from the current theme or using a format that you specify directly.

电流输出

On the insert tab, the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document look. You can easily change the formatting of selected text in the document text by choosing a look for the selected text from the quick styles gallery on the home tab. You can also format text directly by using the other controls on the home tab. most controls offer a choice of using the look from the current theme or using a format that you specify directly.

预期输出:

On the insert tab, the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document look.

You can easily change the formatting of selected text in the document text by choosing a look for the selected text from the quick styles gallery on the home tab. You can also format text directly by using the other controls on the home tab. Most controls offer a choice of using the look from the current theme or using a format that you specify directly.

编辑1:

import re,codecs
def capitalize(match):
    return ''.join([match.group(1), match.group(2).capitalize()])

with codecs.open('i.txt', encoding='utf-8') as f:
    text = f.read()

pattern = re.compile('(^|[.?!]\s+)(\w+)?')

print(pattern.sub(capitalize, text))

当我尝试从基于答案1方法的文件中读取时抛出错误

return ''.join([match.group(1), match.group(2).capitalize()])
AttributeError: 'NoneType' object has no attribute 'capitalize'

Tags: ofthetextyouonmatchwithdocument
1条回答
网友
1楼 · 发布于 2024-06-01 06:42:36

你可以这样做:

import re


def capitalize(match):
    return ''.join([match.group(1), match.group(2).capitalize()])

text = """on the insert tab, the galleries include items that are designed to coordinate with the overall look of your document. you can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. when you create pictures, charts, or diagrams, they also coordinate with your current document look.

you can easily change the formatting of selected text in the document text by choosing a look for the selected text from the quick styles gallery on the home tab. you can also format text directly by using the other controls on the home tab. most controls offer a choice of using the look from the current theme or using a format that you specify directly."""

pattern = re.compile('(^|[.?!]\s+)(\w+)?')

print(pattern.sub(capitalize, text))

输出

On the insert tab, the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document look.

You can easily change the formatting of selected text in the document text by choosing a look for the selected text from the quick styles gallery on the home tab. You can also format text directly by using the other controls on the home tab. Most controls offer a choice of using the look from the current theme or using a format that you specify directly.

注意事项

  • (^|[.?!]\s+)表示捕获.(点)、?!,后跟一个或多个空格字符(制表符、空格等)。^表示字符串的开头;所以这一组的意思是句子的开头,或者一个.?!后跟一个空格
  • (\w+)?表示一个或多个单词字符
  • 然后,capitalize函数保留第一组匹配的内容,并将第二组(单词)大写

相关问题 更多 >