python-groupby/apply：到底传递给apply函数的是什么？

import pandas as pd ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2], 'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017], 'Points':[876,789,863,673,741,812,756,788,694,701,804,690]} df = pd.DataFrame(ipl_data)

Team Rank Year Points 0 Riders 1 2014 876 1 Riders 2 2015 789 2 Devils 2 2014 863 3 Devils 3 2015 673 4 Kings 3 2014 741 5 kings 4 2015 812 6 Kings 1 2016 756 7 Kings 1 2017 788 8 Riders 2 2016 694 9 Royals 4 2014 701 10 Royals 1 2015 804 11 Riders 2 2017 690

File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item KeyError: 0

Team Devils (<class 'pandas.core.frame.DataFrame'>, (2, 4)) Kings (<class 'pandas.core.frame.DataFrame'>, (3, 4)) Riders (<class 'pandas.core.frame.DataFrame'>, (4, 4)) Royals (<class 'pandas.core.frame.DataFrame'>, (2, 4)) kings (<class 'pandas.core.frame.DataFrame'>, (1, 4)) dtype: object

2条回答

网友

1楼 · 编辑于 2024-09-27 23:28:36

Apply函数接受每一行并处理数据，因此Apply实际上不理解要传递给它的索引（如[0]），因此会出现错误。它与df一起工作，因为index remain与df一起工作。你知道吗

你可以尝试这样的方法来为每一队赢得第一分。你知道吗

df.drop_duplicates(subset=['Team'])

输出：

    Team    Rank    Year    Points
0   Riders  1   2014    876
2   Devils  2   2014    863
4   Kings   3   2014    741
5   kings   4   2015    812
9   Royals  4   2014    701

如果需要保留“最大/最小点”行，可以在删除之前对df进行排序复制品。希望如此这很有帮助。你知道吗

网友

2楼 · 编辑于 2024-09-27 23:28:36

当您调用df.groupby('Team').apply(lambda x: ...)时，实际上是按团队切碎数据帧，并将每个块传递给lambda函数：

      Team  Rank  Year  Points
0   Riders     1  2014     876
1   Riders     2  2015     789
8   Riders     2  2016     694
11  Riders     2  2017     690
               
2   Devils     2  2014     863
3   Devils     3  2015     673
               
4    Kings     3  2014     741
6    Kings     1  2016     756
7    Kings     1  2017     788
               
5    kings     4  2015     812
               
9   Royals     4  2014     701
10  Royals     1  2015     804

df['Points'][0]之所以有效，是因为您告诉pandas“获取Points系列的标签0处的值”，该值是存在的。你知道吗

.apply(lambda x: x['Points'][0])不起作用，因为只有1个块（Riders）具有标签0。因此你得到了关键的错误。你知道吗

尽管如此，apply是通用的，因此与内置的向量化聚合函数相比，它的速度相当慢。您可以使用first：

df.groupby('Team')['Points'].first()

相关问题更多 >

编程相关推荐

热门问题

热门文章