回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>假设我有一个显示速度限制的数据集。其理念是,每个地区或城市都可以应用自己的规则,或“继承”其父实体的规则</p>
<pre><code>+-------------+---------------------------+---------------------+-----------+
| country | region | city | max_speed |
+-------------+---------------------------+---------------------+-----------+
| France | | | 50 |
+-------------+---------------------------+---------------------+-----------+
| France | Bretagne | | 70 |
+-------------+---------------------------+---------------------+-----------+
| France | Bretagne | Saint-Grégoire | |
+-------------+---------------------------+---------------------+-----------+
| France | Bretagne | Saint-Malo | 30 |
+-------------+---------------------------+---------------------+-----------+
| France | Île-de-France | | |
+-------------+---------------------------+---------------------+-----------+
| France | Île-de-France | Saint-Cloud | |
+-------------+---------------------------+---------------------+-----------+
| France | Île-de-France | Vélizy-Villacoublay | 50 |
+-------------+---------------------------+---------------------+-----------+
| Germany | | | 70 |
+-------------+---------------------------+---------------------+-----------+
| Germany | Bayern | | |
+-------------+---------------------------+---------------------+-----------+
| Germany | Bayern | Nürnberg | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | | | 90 |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | Harderwijk | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | | 70 |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Haarlem | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Hoorn | 30 |
+-------------+---------------------------+---------------------+-----------+
</code></pre>
<p>每当<code>max_speed</code>值丢失时,应将其推断为父级的值。例如,<em>圣格雷戈瓦</em>的限速是<em>布列塔涅</em>的限速,而<em>哈德维克</em>和<em>纽伦堡</em>则适用该国的规则(分别为90和70)</p>
<p>因此,考虑到这个<code>DataFrame</code>:</p>
<pre class="lang-py prettyprint-override"><code>data = {'country': ['France', 'France', 'France', 'France', 'France', 'France', 'France', 'Germany', 'Germany', 'Germany', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands'],
'region': [None, 'Bretagne', 'Bretagne', 'Bretagne', 'Île-de-France', 'Île-de-France', 'Île-de-France', None, 'Bayern', 'Bayern', None, 'Provincie Gelderland', 'Provincie Gelderland', 'Provincie Noord-Holland', 'Provincie Noord-Holland', 'Provincie Noord-Holland'],
'city': [None, None, 'Saint-Grégoire', 'Saint-Malo', None, 'Saint-Cloud', 'Vélizy-Villacoublay', None, None, 'Nürnberg', None, None, 'Harderwijk', None, 'Haarlem', 'Hoorn'],
'max_speed': [50, 70, None, 30, None, None, 50, 70, None, None, 90, None, None, 70, None, 30]}
speed_limits = pd.DataFrame(data)
</code></pre>
<p>如何填写<code>max_speed</code>中缺少的值以获得:</p>
<pre><code>+-------------+-------------------------+---------------------+-----------+
| country | region | city | max_speed |
+-------------+-------------------------+---------------------+-----------+
| France | | | 50 |
+-------------+-------------------------+---------------------+-----------+
| France | Bretagne | | 70 |
+-------------+-------------------------+---------------------+-----------+
| France | Bretagne | Saint-Grégoire | 70 |
+-------------+-------------------------+---------------------+-----------+
| France | Bretagne | Saint-Malo | 30 |
+-------------+-------------------------+---------------------+-----------+
| France | Île-de-France | | 50 |
+-------------+-------------------------+---------------------+-----------+
| France | Île-de-France | Saint-Cloud | 50 |
+-------------+-------------------------+---------------------+-----------+
| France | Île-de-France | Vélizy-Villacoublay | 50 |
+-------------+-------------------------+---------------------+-----------+
| Germany | | | 70 |
+-------------+-------------------------+---------------------+-----------+
| Germany | Bayern | | 70 |
+-------------+-------------------------+---------------------+-----------+
| Germany | Bayern | Nürnberg | 70 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | | | 90 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | | 90 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | Harderwijk | 90 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | | 70 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Haarlem | 70 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Hoorn | 30 |
+-------------+-------------------------+---------------------+-----------+
</code></pre>
<p>我一直在尝试创建一个函数来应用于<code>max_speed==np.NaN</code>的每一行,检索它的父行(在确定缺少的值是否适用于某个地区或城市之后),然后返回它的<code>max_speed</code>值,但是,除了在这方面不是很成功之外,我甚至不确定这是最明智的方法</p>
<p>有什么想法吗</p>