Python:使用正则表达式从字符串中提取问题

2024-10-06 12:16:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个很简单的问题,但我没有找到一个聪明的方法来解决

我有这样一个字符串:

"""Q 1
wording of question 1
eventually on many lines
Q 2
wording of question 2
Q 3
wording of question 3
Q 4
wording of question 4
"""

我只想将每个问题及其措辞摘录到如下列表中:

['Q 1\nwording of question 1\neventually on many lines','Q 2\nwording of question 2','Q 3\nwording of question 3','Q 4\nwording of question 4']

我试过这样的模式:

(Q \d.+?)Q \d

但是,例如,由于Q 2Q 1的模式中,我无法使用findall获取Q 2,因为这两个模式重叠

我想到了一个使用字符串结尾的解决方案,但我需要从字符串结尾进行搜索,而且在Python中似乎不可行

有人能解决这个问题吗


Tags: of方法字符串列表on结尾模式many
1条回答
网友
1楼 · 发布于 2024-10-06 12:16:28

发布Wiktor Stribiżew's解决方案作为答案,因为这符合我的目的

text = """Q 1
wording of question 1
eventually on many lines
Q 2
wording of question 2
Q 3
wording of question 3
Q 4
wording of question 4
"""

import re

def extract_questions(text):

    q_list = re.findall(r'^Q +\d.*(?:\n(?!Q \d).*)*', text, re.M)
    
    return q_list


extract_questions(text)

返回:

['Q 1\nwording of question 1\neventually on many lines',
 'Q 2\nwording of question 2',
 'Q 3\nwording of question 3',
 'Q 4\nwording of question 4\n']

相关问题 更多 >