查找所有文本，直到下一个regex匹配问题的回答

查找所有文本，直到下一个regex匹配

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试编译所有的文本，直到下次在python中与regex匹配为止。这些数据是网上的辩论记录。在 目前，我正在尝试遍历p标记的所有匹配项，并识别带有标记的speaker的匹配项，然后将所有没有标记speaker的后续文本追加到前一个匹配项中。在 我不确定这是不是最好的方法继续，或者它会更容易简单地搜索和分组所有的文本一次。目前，我只能看到所有的文字开头至少有三个大写字母。在 <pre><code>import re import requests as rq from bs4 import BeautifulSoup as bs r = rq.get('http://www.cbsnews.com/news/transcript-of-the-2015-gop-debate-9-pm/') b = bs(r.text, 'html.parser') debatetext = b.find('div', attrs= {'class' , 'entry'}).findAll('p') pattern = re.compile(r'[A-Z][A-Z][A-Z].*:') for line in debatetext: if re.search(pattern, line.text) is not None: print line </code></pre> 示例文本 ^{2}$ 理想情况下，我希望在第一句话后面加上不带“BUSH:”的三行，或者在第一行的开头加上“BUSH:”或其他候选人说的话。在 编辑：大样本 <pre><code> <div class="entry" itemprop="articleBody" id="article-entry">... CARSON: -- extremely effectively. (APPLAUSE) BAIER: Gentlemen, the next series of questions deals with ObamaCare and the role of the federal government. Mr. Trump, ObamaCare is one of the things you call a disaster. TRUMP: A complete disaster, yes. BAIER: Saying it needs to be repealed and replaced. TRUMP: Correct. BAIER: Now, 15 years ago, uncalled yourself a liberal on health care. You were for a single-payer system, a Canadian-style system. Why were you for that then and why aren't you for it now? TRUMP: First of all, I'd like to just go back to one. In July of 2004, I came out strongly against the war with Iraq, because it was going to destabilize the Middle East. And I'm the only one on this stage that knew that and had the vision to say it. And that's exactly what happened. BAIER: But on ObamaCare... TRUMP: And the Middle East became totally destabilized. So I just want to say. As far as single payer, it works in Canada. It works incredibly well in Scotland. It could have worked in a different age, which is the age you're talking about here. What I'd like to see is a private system without the artificial lines around every state. I have a big company with thousands and thousands of employees. And if I'm negotiating in New York or in New Jersey or in California, I have like one bidder. Nobody can bid. You know why? Because the insurance companies are making a fortune because they have control of the politicians, of course, with the exception of the politicians on this stage. But they have total control of the politicians. They're making a fortune. Get rid of the artificial lines and you will have... (BUZZER NOISE) TRUMP: -- yourself great plans. And then we have to take care of the people that can't take care of themselves. And I will do that through a different system. (CROSSTALK) BAIER: Mr. Trump, hold up one second. PAUL: I've got a news flash... </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

查找所有文本，直到下一个regex匹配

1 个回答

相关Python问题