使用字符串列表筛选数据帧

2024-09-28 03:19:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧,称为pop_df:

pop_df
Out[33]: 
            2014       2015       2016       2017       2018       2019
Geo                                                                    
AL     2892394.0  2885796.0  2875592.0  2876591.0  2870324.0  2862427.0
AL0    2892394.0  2885796.0  2875592.0  2876591.0  2870324.0  2862427.0
AL01    844921.0   836448.0   830981.0   826904.0   819793.0   813758.0
AL011   134332.0   131054.0   129056.0   125579.0   120978.0   118948.0
AL012   276058.0   277989.0   280205.0   284823.0   289626.0   290126.0
         ...        ...        ...        ...        ...        ...
UKN12   142028.0   142756.0   143363.0   143746.0   144105.0   144367.0
UKN13   139774.0   140222.0   140752.0   141368.0   141994.0   142565.0
UKN14   137722.0   139426.0   140691.0   141917.0   143286.0   144771.0
UKN15   136332.0   136904.0   137492.0   138000.0   138441.0   138948.0
UKN16   114696.0   115171.0   115581.0   116057.0   116612.0   117051.0

[2034 rows x 6 columns]

我还有一份欧盟国家代码清单:

EuropeanUnion=["BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE"]

我想过滤数据框,只保留列表中国家代码开头的行。我试过使用pop_df.index.str.startswith,但没能成功。如果你能提供帮助,我将不胜感激。提前谢谢


Tags: 数据代码df国家outpopgeoal
2条回答

我建议您使用最多两个首字母的切片索引(作为字符串),并使用pandas.isin method作为布尔掩码应用于国家代码变量:

eu_countries_filtered = pop_df[pop_df.index.str[:2].isin(EuropeanUnion)]

似乎Geo是索引,因此您可以执行以下操作:

result = df[df.index.str.match(rf'\b{"|".join(EuropeanUnion)}')]

输出(虚拟)

           2014      2015      2016      2017      2018      2019
Geo                                                              
BE011  134332.0  131054.0  129056.0  125579.0  120978.0  118948.0
DE13   139774.0  140222.0  140752.0  141368.0  141994.0  142565.0

从关于str.match的文件中:

Determine if each string starts with a match of a regular expression.

表达式rf'\b{"|".join(EuropeanUnion)}'构建一个正则表达式模式,该模式将匹配任何国家代码

相关问题 更多 >

    热门问题