<p>这是对我的审判。
有文档记录和一些print()用于调试。
query()语句基于pandas/NumPy使用np.nan!=np.nan,并且对待任何人都不像np.nan。
请参阅本页上的注释/警告之一<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html" rel="nofollow noreferrer">https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html</a></p>
<pre><code>import pandas as pd
data = {'country': ['France', 'France', 'France', 'France', 'France', 'France', 'France', 'Germany', 'Germany', 'Germany', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands', 'Netherlands'],
'region': [None, 'Bretagne', 'Bretagne', 'Bretagne', 'Île-de-France', 'Île-de-France', 'Île-de-France', None, 'Bayern', 'Bayern', None, 'Provincie Gelderland', 'Provincie Gelderland', 'Provincie Noord-Holland', 'Provincie Noord-Holland', 'Provincie Noord-Holland'],
'city': [None, None, 'Saint-Grégoire', 'Saint-Malo', None, 'Saint-Cloud', 'Vélizy-Villacoublay', None, None, 'Nürnberg', None, None, 'Harderwijk', None, 'Haarlem', 'Hoorn'],
'max_speed': [50, 70, None, 30, None, None, 50, 70, None, None, 90, None, None, 70, None, 30]}
df = pd.DataFrame(data)
#1)split the initial df in multiple dfs, using df.query():
# - countries - we assume that all of them have max_speed
# - regions - two categories
# - max speed set
# - max speed unset
# - cities - to categories
# - max speed set
# - max speed unset
#
#2) use merge/join to update the max speed for the categories with max speed unset
#
#3) use append to cncatenate all sets, this is final result
# replaced None/nan wit empty string for nice printing
# those will have speed set
# city compare is superflue, but for consistency
df_countries_only = df.query("(region != region) and (city != city) ")
print(df_countries_only)
# fix the regions
df_regions_to_fix = df.query("(city != city) and (max_speed != max_speed) and (region == region)")
df_regions_ok = df.query("(city != city) and (max_speed == max_speed) and (region == region)")
df_regions_speed = pd.merge(df_countries_only.drop(['region', 'city'], axis=1),
df_regions_to_fix.drop(['max_speed'], axis=1), how="inner", on=["country"])
df_regions_speed = df_regions_speed.append(df_regions_ok)
print(df_regions_speed)
df_cities_to_fix = df.query("(city == city) and (max_speed != max_speed)")
df_cities_ok = df.query("(city == city) and (max_speed == max_speed)")
df_cities_speed = pd.merge(df_regions_speed.drop(['city'], axis=1),
df_cities_to_fix.drop(['max_speed'], axis=1), how="inner", on=["country", "region"])
print(df_cities_speed)
# now rebuild final df
df_all_data = df_cities_speed.append(df_cities_ok).append(df_regions_speed).append(df_countries_only)
print("\n\n")
print(df_all_data.sort_values(by=['country', 'region', 'city']).fillna("")[['country', 'region', 'city', 'max_speed']])
</code></pre>