使用组行创建新的数据帧

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} df = pd.DataFrame(raw_data, columns = ['regiment', 'name', 'preTestScore', 'postTestScore']) df regiment name preTestScore postTestScore 0 Nighthawks Miller 4 25 1 Nighthawks Jacobson 24 94 2 Nighthawks Ali 31 57 3 Nighthawks Milner 2 62 4 Dragoons Cooze 3 70 5 Dragoons Jacon 4 25 6 Dragoons Ryaner 24 94 7 Dragoons Sone 31 57 8 Scouts Sloan 2 62 9 Scouts Piger 3 70 10 Scouts Riani 2 62 11 Scouts Ali 3 70

gb = df.groupby("regiment") regiment name preTestScore postTestScore 8 Scouts Sloan 2 62 9 Scouts Piger 3 70 10 Scouts Riani 2 62 11 Scouts Ali 3 70 ------------------ regiment name preTestScore postTestScore 0 Nighthawks Miller 4 25 1 Nighthawks Jacobson 24 94 2 Nighthawks Ali 31 57 3 Nighthawks Milner 2 62 ------------------ regiment name preTestScore postTestScore 4 Dragoons Cooze 3 70 5 Dragoons Jacon 4 25 6 Dragoons Ryaner 24 94 7 Dragoons Sone 31 57 ------------------

regiment name preTestScore postTestScore 8 Scouts Sloan 2 62 0 Nighthawks Miller 4 25 4 Dragoons Cooze 3 70

regiment name preTestScore postTestScore 9 Scouts Piger 3 70 1 Nighthawks Jacobson 24 94 5 Dragoons Jacon 4 25

3条回答

网友

1楼 · 编辑于 2024-06-28 11:07:43

您可能可以使用嵌套的groupby和cumcount来实现这一点，例如，这将对所有第一次出现的团、所有第二次出现的团进行分组，等等：

In []:
[g for _, g in df.groupby(df.groupby('regiment').cumcount())]

Out[]:
[     regiment    name  preTestScore  postTestScore
 0  Nighthawks  Miller             4             25
 4    Dragoons   Cooze             3             70
 8      Scouts   Sloan             2             62,
      regiment      name  preTestScore  postTestScore
 1  Nighthawks  Jacobson            24             94
 5    Dragoons     Jacon             4             25
 9      Scouts     Piger             3             70,
       regiment    name  preTestScore  postTestScore
 2   Nighthawks     Ali            31             57
 6     Dragoons  Ryaner            24             94
 10      Scouts   Riani             2             62,
       regiment    name  preTestScore  postTestScore
 3   Nighthawks  Milner             2             62
 7     Dragoons    Sone            31             57
 11      Scouts     Ali             3             70]

网友

2楼 · 编辑于 2024-06-28 11:07:43

groupby在自定义索引上，使用dicts存储

In [67]: {x:g for x,g in df.sort_values(by='regiment',ascending=False).groupby(df.index%4)}
Out[67]:
{0:      regiment    name  preTestScore  postTestScore
 8      Scouts   Sloan             2             62
 0  Nighthawks  Miller             4             25
 4    Dragoons   Cooze             3             70,
 1:      regiment      name  preTestScore  postTestScore
 9      Scouts     Piger             3             70
 1  Nighthawks  Jacobson            24             94
 5    Dragoons     Jacon             4             25,
 2:       regiment    name  preTestScore  postTestScore
 10      Scouts   Riani             2             62
 2   Nighthawks     Ali            31             57
 6     Dragoons  Ryaner            24             94,
 3:       regiment    name  preTestScore  postTestScore
 11      Scouts     Ali             3             70
 3   Nighthawks  Milner             2             62
 7     Dragoons    Sone            31             57}

或list

In [71]: grps = [g for _,g in (df.sort_values(by='regiment',ascending=False)
                                 .groupby(df.index%4))]

In [72]: grps[0]
Out[72]:
     regiment    name  preTestScore  postTestScore
8      Scouts   Sloan             2             62
0  Nighthawks  Miller             4             25
4    Dragoons   Cooze             3             70

In [73]: grps[1]
Out[73]:
     regiment      name  preTestScore  postTestScore
9      Scouts     Piger             3             70
1  Nighthawks  Jacobson            24             94
5    Dragoons     Jacon             4             25

网友

3楼 · 编辑于 2024-06-28 11:07:43

词典当然是无序的。假设每个团的样本数据只有四行，这里是前四行的排名，它使用了nth上的groupby。结果是使用字典理解来创建的，遍历范围4（0，1，2，3），获取该值的nth行，并将该值转换回其序号名称（例如，0等于'first'）。你知道吗

d = {n: ordinal for n, ordinal in zip(
             range(5), ['first', 'second', 'third', 'fourth', 'fifth'])}

top_n = 4
>>> {d[n]: df.groupby(['regiment']).nth(n) for n in range(top_n)}
{'first':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons     Cooze             70             3
 Nighthawks  Miller             25             4
 Scouts       Sloan             62             2,
 'fourth':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons      Sone             57            31
 Nighthawks  Milner             62             2
 Scouts         Ali             70             3,
 'second':                 name  postTestScore  preTestScore
 regiment                                         
 Dragoons       Jacon             25             4
 Nighthawks  Jacobson             94            24
 Scouts         Piger             70             3,
 'third':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons    Ryaner             94            24
 Nighthawks     Ali             57            31
 Scouts       Riani             62             2}

对于不同长度的行：

df = df.iloc[1:-1, :]  # Drop first and last row.
>>> {d[n]: df.groupby(['regiment']).nth(n).reindex(sorted(df.regiment.unique())) 
     for n in range(top_n)}
{'first':                 name  postTestScore  preTestScore
 regiment                                         
 Dragoons       Cooze             70             3
 Nighthawks  Jacobson             94            24
 Scouts         Sloan             62             2,
 'fourth':             name  postTestScore  preTestScore
 regiment                                     
 Dragoons    Sone             57            31
 Nighthawks   NaN            NaN           NaN
 Scouts       NaN            NaN           NaN,
 'second':              name  postTestScore  preTestScore
 regiment                                      
 Dragoons    Jacon             25             4
 Nighthawks    Ali             57            31
 Scouts      Piger             70             3,
 'third':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons    Ryaner             94            24
 Nighthawks  Milner             62             2
 Scouts       Riani             62             2}

相关问题更多 >

编程相关推荐

热门问题

热门文章