擅长:python、mysql、java
<p>使用<code>re.sub</code>我们可以尝试先删除不需要的重复文本。然后,执行第二次替换,将剩余的换行符替换为单个空格:</p>
<pre class="lang-py prettyprint-override"><code>inp = """1
00:00:05.210 > 00:00:07.710
In this lecture, we're
going to talk about
2
00:00:07.710 > 00:00:10.815
pattern matching in strings
using regular expressions.
3
00:00:10.815 > 00:00:13.139
Regular expressions or regexes
4
00:00:13.139 > 00:00:15.825
are written in a condensed
formatting language."""
output = re.sub(r'(?:^|\r?\n)\d+\r?\n\d{2}:\d{2}:\d{2}\.\d{3} > \d{2}:\d{2}:\d{2}\.\d{3}\r?\n', '', inp)
output = re.sub(r'\r?\n', ' ', output)
sentences = re.findall(r'(.*?\.)\s*', output)
print(sentences)
</code></pre>
<p>这张照片是:</p>
<pre><code>["In this lecture, we're going to talk about pattern matching in strings using regular expressions.",
'Regular expressions or regexes are written in a condensed formatting language.']
</code></pre>