<p>将捕获组更改为在<code>xxx=</code>之后匹配,而不是在<code>xxx=</code>本身之后匹配。<code>(?:;|$)</code>检查作为终止符的<code>;</code>或行尾</p>
<pre class="lang-py prettyprint-override"><code>df['xxx'] = df.Misc.str.extract(r'xxx=(.*?)(?:;|$)', expand=True)
df['xyx'] = df.Misc.str.extract(r'xyx=(.*?)(?:;|$)', expand=True)
</code></pre>
<p>或者您可以<a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html" rel="nofollow noreferrer"><strong>^{<cd5>}</strong></a>在理解中自动创建这些列:</p>
<pre class="lang-py prettyprint-override"><code>keys = ['xxx', 'xyx']
df = df.assign(**{key: df.Misc.str.extract(rf'{key}=(.*?)(?:;|$)', expand=True) for key in keys})
</code></pre>
<p>输出:</p>
<pre><code># Misc xxx xyx
# 0 1. xxx=something;yyyblah=somethingelse;xyx=blah something blah
# 1 2. xyz=meh;yzxx=random;xyx=meh NaN meh
# 2 3. xxx=foo;xxxxy=bar foo NaN
# 3 4. xxx=meh,blah/other=super 3;zzz=1 meh,blah/other=super 3 NaN
</code></pre>
<hr/>
<h3>计时</h3>
<p>我无法得到Andrej的答案来处理我的问题(重新编制索引错误),但以下是包含40K行的其他计时:</p>
<pre><code>>>> df = pd.DataFrame({'Misc':['1. xxx=something;yyyblah=somethingelse;xyx=blah','2. xyz=meh;yzxx=random;xyx=meh','3. xxx=foo;xxxxy=bar','4. xxx=meh,blah/other=super 3;zzz=1']})
>>> df = pd.concat([df]*10000)
>>> %timeit tdy(df)
75.5 ms ± 5.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit wwnde(df)
83.6 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
</code></pre>