我有以下数据帧,称为pop_df:
pop_df
Out[33]:
2014 2015 2016 2017 2018 2019
Geo
AL 2892394.0 2885796.0 2875592.0 2876591.0 2870324.0 2862427.0
AL0 2892394.0 2885796.0 2875592.0 2876591.0 2870324.0 2862427.0
AL01 844921.0 836448.0 830981.0 826904.0 819793.0 813758.0
AL011 134332.0 131054.0 129056.0 125579.0 120978.0 118948.0
AL012 276058.0 277989.0 280205.0 284823.0 289626.0 290126.0
... ... ... ... ... ...
UKN12 142028.0 142756.0 143363.0 143746.0 144105.0 144367.0
UKN13 139774.0 140222.0 140752.0 141368.0 141994.0 142565.0
UKN14 137722.0 139426.0 140691.0 141917.0 143286.0 144771.0
UKN15 136332.0 136904.0 137492.0 138000.0 138441.0 138948.0
UKN16 114696.0 115171.0 115581.0 116057.0 116612.0 117051.0
[2034 rows x 6 columns]
我还有一份欧盟国家代码清单:
EuropeanUnion=["BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE"]
我想过滤数据框,只保留列表中国家代码开头的行。我试过使用pop_df.index.str.startswith
,但没能成功。如果你能提供帮助,我将不胜感激。提前谢谢
我建议您使用最多两个首字母的切片索引(作为字符串),并使用pandas.isin method作为布尔掩码应用于国家代码变量:
似乎
Geo
是索引,因此您可以执行以下操作:输出(虚拟)
从关于str.match的文件中:
表达式
rf'\b{"|".join(EuropeanUnion)}'
构建一个正则表达式模式,该模式将匹配任何国家代码相关问题 更多 >
编程相关推荐