擅长:python、mysql、java
<p><strong>编辑</strong>:对于实际数据,应使用<code>str.findall</code>,如下所示</p>
<pre><code>df['b_median'] = [np.median(pd.to_numeric(x if bool(x) else np.nan, errors='coerce'))
for x in df['built_up'].str.findall('\d+')]
</code></pre>
<hr/>
<p><strong>原件</strong>:</p>
<p>您的实际数据有一些不平衡的字符串,请在使用<code>np.median</code>和<code>pd.to_numeric</code>调用<code>map</code>之前尝试<code>strip</code></p>
<pre><code>s = (df['built_up'].map(lambda x:
np.median(pd.to_numeric(x.strip('- ').split('-'), errors='coerce'))))
Out[356]:
0 1550.0
1 1104.5
2 1841.5
3 2850.5
4 1420.0
5 NaN
Name: built_up, dtype: float64
</code></pre>
<hr/>
<p><strong>方法2</strong>:在处理单元格中的字符串时,列表理解速度更快</p>
<pre><code>df['b_median'] = [np.mean(pd.to_numeric(x.strip('- ').split('-'), errors='coerce'))
for x in df.built_up]
Out[354]:
built_up b_median
0 1498-1602 1550.0
1 1022-1187 1104.5
2 1713-1970 1841.5
3 2305-3396 2850.5
4 1420 1420.0
5 - NaN
</code></pre>