<p>虽然<a href="https://stackoverflow.com/users/2901002/jezrael">jezrael's</a><a href="https://stackoverflow.com/a/61479830/3622349">answer</a>可以工作,但它比需要的慢,因为它必须首先进行映射,然后返回并填充缺少的元素。如果我们利用Python的内置字典,我们可以显著提高性能</p>
<p>有两种方法可以利用python字典对象的灵活性来创建默认值。一个是使用映射字典上的<a href="https://docs.python.org/3/library/stdtypes.html#dict.get" rel="nofollow noreferrer">get method</a>,另一个是使用<a href="https://docs.python.org/2/library/collections.html#defaultdict-objects" rel="nofollow noreferrer">defaultdict object from collections</a>。如上所述,<code>get</code>和<code>defaultdict</code>方法的优点是,它们避免了在映射后回顾整个系列以替换NAs,而是在映射步骤本身内进行</p>
<p>因此,简而言之,我建议:</p>
<pre><code>df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}
df['Territory'] = df['Territory'].map(lambda x: di.get(x, 'OVERSEAS'))
</code></pre>
<p>支持此方法性能的一些时间安排包括:</p>
<pre><code>df = pd.DataFrame({'Territory':['NY','CA','WT','SK','DE']})
di = {"NY": "Domestic","CA": "Domestic","WT":"OUTSIDE"}
%timeit df['Territory'].map(lambda x: di.get(x, 'OVERSEAS'))
>>> 138 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
from collections import defaultdict
dd = defaultdict(lambda:'OVERSEAS')
dd.update(di)
%timeit df['Territory'].map(di)
>>> 143 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit df['Territory'] = df['Territory'].map(di).fillna('OVERSEAS')
>>> 657 µs ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<p>对于较大的词典,性能上的差异变得更加明显:</p>
<p>另外值得注意的是,如果没有默认值,那么在Pandas中只映射一个缺少术语的dict似乎很慢</p>
<pre><code>%timeit df['Territory'].map(di)
>>> 372 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>