擅长:python、mysql、java
<p>简单的正则表达式如何:</p>
<pre><code>text = 'Harry Potter (1997)'
re.findall('\((\d{4})\)', text)
# ['1997'] Note that this is a list of "all" the occurrences.
</code></pre>
<hr/>
<p>对于数据帧,可以这样做:</p>
<pre><code>text = 'Harry Potter (1997)'
df = pd.DataFrame({'Book': text}, index=[1])
pattern = '\((\d{4})\)'
df['year'] = df.Book.str.extract(pattern, expand=False) #False returns a series
df
# Book year
# 1 Harry Potter (1997) 1997
</code></pre>
<hr/>
<p>最后,如果您真的想将标题和数据分开(在另一个答案中采用Philip的数据帧重建):</p>
<pre><code>df = pd.DataFrame(columns=['Book'], data=[['Harry Potter (1997)'],['Of Mice and Men (1937)'],['Babe Ruth Story, The (1948) Drama 948) Babe Ruth Story']])
sep = df['Book'].str.extract('(.*)\((\d{4})\)', expand=False)
sep # A new df, separated into title and year
# 0 1
# 0 Harry Potter 1997
# 1 Of Mice and Men 1937
# 2 Babe Ruth Story, The 1948
</code></pre>