根据性别统计姓名数量,并显示前10名

2024-05-17 06:34:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据集:

df = pd.DataFrame({'name':["a"," b", "c","d", "e","a"," a", "a"," b", "c","d", "e","a"," a"],
               'gender': ["male", "female", "female", "female", "male","male","male","female","female", "female", "male","male","male"],
              'year':[2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019, 2020],
              'month':[1, 12, 4, 3, 6, 7, 2, 4, 5, 1, 12, 4, 3, 6 ]
              'count':[100, 30, 10, 90,34, 100, 30, 10, 90,34, 100, 30, 10, 90,34, 36, 76]})

数据集显示姓名、性别、出生年份和出生月份以及人数。例如,2005年1月,有100名婴儿被命名为“a”。 我想找出男性和女性的前10个常用名字。如下: enter image description here

我试过这个密码

data.groupby('name','gender')['count'].count().nlargest(10)

Tags: 数据namedataframedfcountgenderyearmale
1条回答
网友
1楼 · 发布于 2024-05-17 06:34:56

尝试:

df.groupby(['gender','name'])['count'].count().nlargest(10)

当使用groupby对多个列进行分组时,应该使用列名称列表,而不是将它们作为参数传递给函数

另外,PS,您的示例数据的构造非常糟糕,每列都有不同数量的数据点,并且名称/性别非常不一致

相关问题 更多 >