假设我有一个显示速度限制的数据集。其理念是,每个地区或城市都可以应用自己的规则,或“继承”其父实体的规则
+-------------+---------------------------+---------------------+-----------+
| country | region | city | max_speed |
+-------------+---------------------------+---------------------+-----------+
| France | | | 50 |
+-------------+---------------------------+---------------------+-----------+
| France | Bretagne | | 70 |
+-------------+---------------------------+---------------------+-----------+
| France | Bretagne | Saint-Grégoire | |
+-------------+---------------------------+---------------------+-----------+
| France | Bretagne | Saint-Malo | 30 |
+-------------+---------------------------+---------------------+-----------+
| France | Île-de-France | | |
+-------------+---------------------------+---------------------+-----------+
| France | Île-de-France | Saint-Cloud | |
+-------------+---------------------------+---------------------+-----------+
| France | Île-de-France | Vélizy-Villacoublay | 50 |
+-------------+---------------------------+---------------------+-----------+
| Germany | | | 70 |
+-------------+---------------------------+---------------------+-----------+
| Germany | Bayern | | |
+-------------+---------------------------+---------------------+-----------+
| Germany | Bayern | Nürnberg | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | | | 90 |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | Harderwijk | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | | 70 |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Haarlem | |
+-------------+---------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Hoorn | 30 |
+-------------+---------------------------+---------------------+-----------+
每当max_speed
值丢失时,应将其推断为父级的值。例如,圣格雷戈瓦的限速是布列塔涅的限速,而哈德维克和纽伦堡则适用该国的规则(分别为90和70)
因此,考虑到这个DataFrame
:
data = {'country': ['France', 'France', 'France', 'France', 'France', 'France', 'France', 'Germany', 'Germany', 'Germany', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands'],
'region': [None, 'Bretagne', 'Bretagne', 'Bretagne', 'Île-de-France', 'Île-de-France', 'Île-de-France', None, 'Bayern', 'Bayern', None, 'Provincie Gelderland', 'Provincie Gelderland', 'Provincie Noord-Holland', 'Provincie Noord-Holland', 'Provincie Noord-Holland'],
'city': [None, None, 'Saint-Grégoire', 'Saint-Malo', None, 'Saint-Cloud', 'Vélizy-Villacoublay', None, None, 'Nürnberg', None, None, 'Harderwijk', None, 'Haarlem', 'Hoorn'],
'max_speed': [50, 70, None, 30, None, None, 50, 70, None, None, 90, None, None, 70, None, 30]}
speed_limits = pd.DataFrame(data)
如何填写max_speed
中缺少的值以获得:
+-------------+-------------------------+---------------------+-----------+
| country | region | city | max_speed |
+-------------+-------------------------+---------------------+-----------+
| France | | | 50 |
+-------------+-------------------------+---------------------+-----------+
| France | Bretagne | | 70 |
+-------------+-------------------------+---------------------+-----------+
| France | Bretagne | Saint-Grégoire | 70 |
+-------------+-------------------------+---------------------+-----------+
| France | Bretagne | Saint-Malo | 30 |
+-------------+-------------------------+---------------------+-----------+
| France | Île-de-France | | 50 |
+-------------+-------------------------+---------------------+-----------+
| France | Île-de-France | Saint-Cloud | 50 |
+-------------+-------------------------+---------------------+-----------+
| France | Île-de-France | Vélizy-Villacoublay | 50 |
+-------------+-------------------------+---------------------+-----------+
| Germany | | | 70 |
+-------------+-------------------------+---------------------+-----------+
| Germany | Bayern | | 70 |
+-------------+-------------------------+---------------------+-----------+
| Germany | Bayern | Nürnberg | 70 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | | | 90 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | | 90 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Gelderland | Harderwijk | 90 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | | 70 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Haarlem | 70 |
+-------------+-------------------------+---------------------+-----------+
| Netherlands | Provincie Noord-Holland | Hoorn | 30 |
+-------------+-------------------------+---------------------+-----------+
我一直在尝试创建一个函数来应用于max_speed==np.NaN
的每一行,检索它的父行(在确定缺少的值是否适用于某个地区或城市之后),然后返回它的max_speed
值,但是,除了在这方面不是很成功之外,我甚至不确定这是最明智的方法
有什么想法吗
利用
ffill()
完成工作。首先垂直宣传国家和地区限速,并设置仅限城市的限速栏。然后从左向右传播速度限制,以获得继承的最大速度限制创建工作数据框:
复制并宣传国家/地区的车速限制:
复制区域速度限制并在区域内传播:
创建仅限城市的速度限制列:
通过在
cntry_spd
、reg_spd
、city_spd
列上从左到右向前填充NA,在speed_limits DF
上设置max_speed
列,继承尚未设置的速度限制:结果:
这是对我的审判。 有文档记录和一些print()用于调试。 query()语句基于pandas/NumPy使用np.nan!=np.nan,并且对待任何人都不像np.nan。 请参阅本页上的注释/警告之一https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
相关问题 更多 >
编程相关推荐