<p>如果你确信你的年龄将是终点,你可以做到</p>
<pre><code>df['year'] = df['name'].str[-5:-1].astype(int)
</code></pre>
<p>这将获取列<code>name</code>,使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.html" rel="nofollow noreferrer">^{<cd2>} accessor</a>作为字符串访问每一行的值,并从中获取<code>-5:-1</code>切片。然后,它将结果转换为<code>int</code>,并将其设置为<code>year</code>列。如果您有大量数据,这种方法将比迭代行快得多</p>
<hr/>
<p>或者,您可以使用<code>str</code>访问器的<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html" rel="nofollow noreferrer">^{<cd6>}</a>方法使用regex以获得更大的灵活性</p>
<pre><code>df['year'] = df['name'].str.extract(r'\((\d{4})\)').astype(int)
</code></pre>
<p>这将提取与表达式<code>\((\d{4})\)</code>(<a href="https://regex101.com/r/lGZ2Ep/1" rel="nofollow noreferrer">Try it here</a>)匹配的组,这意味着捕获正好包含四位数字的一对括号内的数字,并将在字符串中的任何位置工作。要将其锚定到字符串的末尾,请在正则表达式的末尾使用<code>$</code>,如:<code>\((\d{4})\)$</code>。使用正则表达式和使用字符串切片的结果是相同的</p>
<hr/>
<p>现在我们有了新的数据帧:</p>
<pre><code> gross name year
0 760507625.0 Avatar (2009) 2009
1 658672302.0 Titanic (1997) 1997
2 652270625.0 Jurassic World (2015) 2015
3 623357910.0 The Avengers (2012) 2012
4 534858444.0 The Dark Knight (2008) 2008
5 532177324.0 Rogue One (2016) 2016
6 474544677.0 Star Wars: Episode I - The Phantom Menace (1999) 1999
7 459005868.0 Avengers: Age of Ultron (2015) 2015
8 448139099.0 The Dark Knight Rises (2012) 2012
9 436471036.0 Shrek 2 (2004) 2004
10 424668047.0 The Hunger Games: Catching Fire (2013) 2013
11 423315812.0 Pirates of the Caribbean: Dead Man's Chest (2006) 2006
12 415004880.0 Toy Story 3 (2010) 2010
13 409013994.0 Iron Man 3 (2013) 2013
14 408084349.0 Captain America: Civil War (2016) 2016
15 408010692.0 The Hunger Games (2012) 2012
16 403706375.0 Spider-Man (2002) 2002
17 402453882.0 Jurassic Park (1993) 1993
18 402111870.0 Transformers: Revenge of the Fallen (2009) 2009
19 400738009.0 Frozen (2013) 2013
20 381011219.0 Harry Potter and the Deathly Hallows: Part 2 (... 2011
21 380843261.0 Finding Nemo (2003) 2003
22 380262555.0 Star Wars: Episode III - Revenge of the Sith (... 2005
23 373585825.0 Spider-Man 2 (2004) 2004
24 370782930.0 The Passion of the Christ (2004) 2004
</code></pre>