提取具有特殊值的数据

2024-09-30 01:19:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据集:

df = pd.DataFrame({'scientist':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
                             "Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
               'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
                                "Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
                                "Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
                                "Cardiovascular System & Hematology", "Biomedical Engineering"]})

我想计算每个学科领域的科学家数量,并提取出拥有两名以上科学家的学科领域。这是我计算科学家人数的代码

 number_of_scientists_in_fields=data.groupby(['SubjectField'])['scientist'].count()

如何提取包含两名以上科学家的主题字段


Tags: 数据dataframedf领域pd科学家chemistryengineering
3条回答

使用value_counts,如下所示:

fields = df.value_counts('SubjectField').to_frame('count')
res = fields[fields['count'] > 2]
print(res)

输出

                        count
SubjectField                 
Biomedical Engineering      4

您只需要创建一个Series,然后使用> 2对其进行过滤:

使用^{}

In [2554]: x = df.groupby('SubjectField')['scientist'].count()
In [2559]: ans = x[x > 2]

In [2560]: ans
Out[2560]: 
SubjectField
Biomedical Engineering    4
Name: scientist, dtype: int64

另一种可能不如Dani的好的方法是:

> df2 = df[df.SubjectField.duplicated(keep=False)]
> df2.groupby('SubjectField').count()
                        scientist
SubjectField
Biomedical Engineering          4

但是,此示例将包括2个或更多(不是>;2)

相关问题 更多 >

    热门问题