pd.U值不做该做的事

2024-09-30 16:34:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv文件,我已经用 df = pd.read_csv("af.csv")

CSV文件如下所示(预览):

"match_id","start_time","win","leaguename","opposing_team","team","min"
2992096687,1486840800,True,"CaptainsDraft",3729377,2642171,1453382256
2992217489,1486845476,true,"Captains Draft",3729377,2642171,1453382256
2994454005,1486926905,false,"Captains Draft",2586976,2642171,1453382256
2659805546,1474478411,false,"BTSSeries",55,2642171,1454281287
2659879628,1474481141,false,"BTSSeries",55,2642171,1454281287
2661783205,1474563571,false,"BTSSeries",2537636,2642171,1454281287
2661875544,1474566865,false,"BTSSeries",2537636,2642171,1454281287
2662027296,1474573160,true,"BTSSeries",59,2642171,1454281287
2758086417,1478352060,true,"ESLManila16",2163,2642171,1454692269
2758241073,1478355547,true,"ESLManila16",2163,2642171,1454692269
2747710178,1477941012,false,"ESLFrankfurt16",2850016,2642171,1459782261
2747808587,1477945318,true,"ESLFrankfurt16",2850016,2642171,1459782261
2747861268,1477947994,true,"ESLFrankfurt16",2850016,2642171,1459782261

现在我要做的是保持联赛的第一场比赛,然后是该联赛所有比赛的赢数(真是赢,假是输),然后按开始时间排序

我有以下代码来执行此操作:

df1 = df.groupby(['leaguename', 'team']).sum().reset_index()
df1 = df1[['win','leaguename','team']]

df2 = df.sort_values("start_time").groupby("leaguename", as_index=False).first()
df2 = df2[['leaguename', 'start_time']]

output = pd.merge(df1, df2, 'inner', on = 'leaguename')

输出返回混乱无序的开始时间:

,win,leaguename,team,start_time
0,5.0,ASUSROGSeason6,2642171,1478022101
1,6.0,CaptainsDraft,2642171,1486840800
2,3.0,Dota2Asia17,2642171,1486130597
3,2.0,DotaPitSeason5,2642171,1476903919
4,5.0,ESLFrankfurt16,2642171,1477941012
5,2.0,ESLManila16,2642171,1478352060
6,6.0,GlobalGrandMasters,2642171,1466176095
7,4.0,NanyangChampionshipsSeason2,2642171,1464178206

期望输出:

,win,leaguename,team,start_time
0,4.0,NanyangChampionshipsSeason2,2642171,1464178206
1,6.0,GlobalGrandMasters,2642171,1466176095
2,2.0,DotaPitSeason5,2642171,1476903919
3,5.0,ESLFrankfurt16,2642171,1477941012
4,5.0,ASUSROGSeason6,2642171,1478022101
5,2.0,ESLManila16,2642171,1478352060
6,3.0,Dota2Asia17,2642171,1486130597
7,6.0,CaptainsDraft,2642171,1486840800

我怎样才能达到预期的产出?你知道吗


Tags: csvfalsetruedftimestartwinteam
1条回答
网友
1楼 · 发布于 2024-09-30 16:34:34

我认为您需要^{}按列start_time使用^{}和参数drop=True作为默认的唯一单调索引:

output = output.sort_values('start_time').reset_index(drop=True)
#data by output sample
print (output)
   win                   leaguename     team  start_time
0  4.0  NanyangChampionshipsSeason2  2642171  1464178206
1  6.0           GlobalGrandMasters  2642171  1466176095
2  2.0               DotaPitSeason5  2642171  1476903919
3  5.0               ESLFrankfurt16  2642171  1477941012
4  5.0               ASUSROGSeason6  2642171  1478022101
5  2.0                  ESLManila16  2642171  1478352060
6  3.0                  Dota2Asia17  2642171  1486130597
7  6.0                CaptainsDraft  2642171  1486840800

另一种解决方案是将sort=False添加到两个groupby

df1 = df.groupby(['leaguename', 'team'], sort=False).sum().reset_index()
df1 = df1[['win','leaguename','team']]

df2 = df.sort_values("start_time").groupby("leaguename", as_index=False, sort=False).first()
df2 = df2[['leaguename', 'start_time']]

output = pd.merge(df1, df2,  on = 'leaguename')
#data by input sample
print (output)
   win      leaguename     team  start_time
0  2.0  Captains Draft  2642171  1486840800
1  1.0       BTSSeries  2642171  1474478411
2  2.0     ESLManila16  2642171  1478352060
3  2.0  ESLFrankfurt16  2642171  1477941012

相关问题 更多 >