擅长:python、mysql、java
<p>这里有一种方法使用值计数结果上的<code>loc</code>来过滤那些超过最小计数值的制造商</p>
<pre><code># Sample data.
df = pd.DataFrame(
{'manufacturer':
['VW'] * 2228
+ ['Opel'] * 1414
+ ['Renault'] * 1362
+ ['Audi'] * 895
+ ['BMW'] * 888
+ ['Mercedes-Benz'] * 787}
)
</code></pre>
<p>解决方案:</p>
<pre><code>min_count = 1000
main_manufacturers = set(
df['manufacturer'].value_counts(sort=False).loc[lambda x: x >= min_count].index)
df = df.loc[df['manufacturer'].isin(main_manufacturers)]
</code></pre>