正则表达式以获取Python中多个换行符之间的文本

2024-10-03 21:29:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试按顺序拆分介于\n\n和\n之间的文本。以这个字符串为例:

\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good.\n\nPears are good as well. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it.

我期望的输出是:

[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

我想这样解析,因为\n\n和\n之间的任何内容都是标题,其余的是标题下的文本(所以是“健康水果”和“酸味水果”。不确定这是否是获取标题及其文本的最佳方式


Tags: and文本标题appleisonasit
2条回答

鉴于:

txt='''\
\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good.\n\nPears are good as well. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it.'''

desired=[('Healthy Fruits',   "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'),  ('Sour Fruits',   'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

您可以使用正则表达式:

r'\n\n([\s\S]*?)(?=(?:\n\n.*\n[^\n])|\Z)'

Demo

Python演示:

>>> sp=[tuple(re.split('\n+',l)) for l in re.findall(r'\n\n([\s\S]*?)(?=(?:\n\n.*\n[^\n])|\Z)',txt) if '\n' in l]

>>> sp
[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

>>> sp==desired
True

这不是正则表达式,但它可以工作:

text="\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it."
    NewList=[]
    Newtext=text.split("\n\n")
    for line in Newtext:
        if line.find("\n")>=0:
            NewList.extend(line.split('\n'))
    
    NewList[len(NewList)-1]=str(NewList[len(NewList)-1])+str(Newtext[len(Newtext)-1])

相关问题 更多 >