<p>我知道这不是你所期望的,但可能会很有趣</p>
<p>输入数据:</p>
<pre><code>>>> df
text
0 myname giraffe0086
1 cat whale4321
2 giraffe9064
3 poultry dolphin4356
4 fifty giraffe2345 nine
5 giraffe3434 catnap
6 nothing to catch
</code></pre>
<p>在字符串中查找动物和数字:</p>
<pre><code>import re
# https://docs.python.org/3/library/re.html#index-15
PAT = re.compile(r'(?P<animal>\w+)(?=(?P<number>\d{4}))')
sre = df['text'].apply(PAT.search)
</code></pre>
<pre><code>>>> sre
0 <re.Match object; span=(7, 14), match='giraffe'>
1 <re.Match object; span=(4, 9), match='whale'>
2 <re.Match object; span=(0, 7), match='giraffe'>
3 <re.Match object; span=(8, 15), match='dolphin'>
4 <re.Match object; span=(6, 13), match='giraffe'>
5 <re.Match object; span=(0, 7), match='giraffe'>
6 None
Name: text, dtype: object
</code></pre>
<p>使用<code>animal</code>、<code>start</code>、<code>end</code>和<code>number</code>列构建数据帧:</p>
<pre><code>extract_data = lambda r: (r.group('animal'), r.start(), r.end()-4, r.group('number')
df1 = sre[sre.notnull()].apply(extract_data).apply(pd.Series) \
.rename(columns={0: 'animal', 1: 'start', 2: 'end', 3: 'number'})
</code></pre>
<p>合并<code>df</code>和<code>df1</code>:</p>
<pre><code>df = pd.concat([df, df1], axis="columns")
</code></pre>
<pre><code>>>> df
text animal start end number
0 myname giraffe0086 giraffe 7.0 14.0 0086
1 cat whale4321 whale 4.0 9.0 4321
2 giraffe9064 giraffe 0.0 7.0 9064
3 poultry dolphin4356 dolphin 8.0 15.0 4356
4 fifty giraffe2345 nine giraffe 6.0 13.0 2345
5 giraffe3434 catnap giraffe 0.0 7.0 3434
6 nothing to catch NaN NaN NaN NaN
</code></pre>