在句子中以“{{{”的特殊形式拆分文本

prueba = 'Anarchism does not offer a fixed body of doctrine from a single particular worldview.{{sfn|Marshall|1993|pp=14–17}} Many types and traditions of anarchism exist, not all of which are mutually exclusive.[[Sylvan|2007|p=262]] [[Anarchist schools of thought]] can differ fundamentally, supporting anything from extreme [[individualism]] to complete [[collectivism]].' sentences = re.split('(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', prueba)

['Anarchism does not offer a fixed body of doctrine from a single particular worldview.{{sfn|Marshall|1993|pp=14–17}} Many types and traditions of anarchism exist, not all of which are mutually exclusive.[[sfn|Sylvan|2007|p=262]] [[Anarchist schools of thought]] can differ fundamentally, supporting anything from extreme [[individualism]] to complete [[collectivism]].']

['Anarchism does not offer a fixed body of doctrine from a single particular worldview.', '{{sfn|Marshall|1993|pp=14–17}} Many types and traditions of anarchism exist, not all of which are mutually exclusive.', '[[Sylvan|2007|p=262]] [[Anarchist schools of thought]] can differ fundamentally, supporting anything from extreme [[individualism]] to complete [[collectivism]].']

1条回答

网友

1楼 · 发布于 2024-09-27 02:18:06

既然您似乎试图保留分隔符，那么您可能需要re.findall()。请看下面的答案https://stackoverflow.com/a/44244698/11199887，然后根据您的情况进行调整。使用re.findall()，您不必担心.{{和.和.[[之间的差异

import re

s = """You! Are you Tom? I am Danny."""
re.findall('.*?[.!\?]', s)
# ['You!', ' Are you Tom?', ' I am Danny.']

在上面的例子中，你不仅要捕捉句点，还要捕捉结束句子的问号和感叹号。在维基百科上，可能没有很多以感叹号或问号结尾的句子，但我并没有真正花时间去寻找例子

对于您的情况，它看起来是这样的：

prueba = 'Anarchism does not offer a fixed body of doctrine from a single particular worldview.{{sfn|Marshall|1993|pp=14–17}} Many types and traditions of anarchism exist, not all of which are mutually exclusive.[[Sylvan|2007|p=262]] [[Anarchist schools of thought]] can differ fundamentally, supporting anything from extreme [[individualism]] to complete [[collectivism]].'

sentences = re.findall('.*?[.!\?]', prueba)

或者如果你真的只想分时段。你知道吗

sentences = re.findall('.*?[.]', prueba)

print(sentences)的输出是：

['Anarchism does not offer a fixed body of doctrine from a single particular worldview.',
 '{{sfn|Marshall|1993|pp=14–17}} Many types and traditions of anarchism exist, not all of which are mutually exclusive.',
 '[[Sylvan|2007|p=262]] [[Anarchist schools of thought]] can differ fundamentally, supporting anything from extreme [[individualism]] to complete [[collectivism]].']

相关问题更多 >

编程相关推荐

热门问题

热门文章