根据数据框中包含列表对象的列,按一列排序并计算另一列的值

2024-05-19 10:53:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,如下所示:

| people               | statusName  |
| -------------------- | ----------- |
| [Steve]              | To Do       |
| [Jill, John]         | To Do       |
| [Jill, John]         | To Do       |
| [Jill, John]         | Completed   |
| [Amanda, John]       | To Do       |
| [Meryll, Jill, John] | To Do       |
| [Meryll, Jill, John] | In Progress |
| [Meryll, Bill]       | Completed   |
| [John, Tim]          | To Do       |
| [John, Tim]          | To Do       |
| [John, Tim]          | Assigned    |
| [John, Tom]          | In Progress |

因此,第一列是列表类型。我想根据每个人的不同状态名称对其进行排序。因此,所需的数据帧如下所示:


| people | Total | To Do | In Progress | Completed | Stopped |
|--------|-------|-------|-------------|-----------|---------|
| Steve  |     1 |     1 |           0 |         0 |       0 |
| Jill   |     5 |     3 |           1 |         1 |       0 |
| John   |     6 |     4 |           1 |         1 |       0 |
| Amanda |     1 |     1 |           0 |         0 |       0 |
| Meryll |     3 |     1 |           1 |         1 |       0 |
| Bill   |     1 |     0 |           0 |         1 |       0 |
| Tim    |     3 |     2 |           0 |         0 |       1 |
| Tom    |     1 |     0 |           1 |         0 |       0 |

所以基本上我想要的是当people列是一个字符串而不是不同人名的列表类型时crosstab函数如何工作

如何使用dataframe实现相同的功能?或在这种情况下适用的任何方法

数据帧:

df = pd.DataFrame({'people': {0: ['Steve'],
  1: ['Jill', 'John'],
  2: ['Jill', 'John'],
  3: ['Jill', 'John'],
  4: ['Amanda', 'John'],
  5: ['Meryll', 'Jill', 'John'],
  6: ['Meryll', 'Jill', 'John'],
  7: ['Meryll', 'Bill'],
  8: ['John', 'Tim'],
  9: ['John', 'Tim'],
  10: ['John', 'Tim'],
  11: ['John', 'Tom']},
 'statusName': {0: 'To Do',
  1: 'To Do',
  2: 'To Do',
  3: 'Completed',
  4: 'To Do',
  5: 'To Do',
  6: 'In Progress',
  7: 'Completed',
  8: 'To Do',
  9: 'To Do',
  10: 'Assigned',
  11: 'In Progress'}})

Tags: to数据injohnpeopledosteveprogress
3条回答

使用explodeget_dummies,然后使用groupbysum

df_out = (
    pd.get_dummies(df.explode("people").set_index("people"), prefix="", prefix_sep="")
    .groupby(level=0)
    .sum()
)
df_out["Total"] = df_out.sum(axis=1)
df_out.reset_index()

输出:

   people  Assigned  Completed  In Progress  To Do  Total
0  Amanda         0          0            0      1      1
1    Bill         0          1            0      0      1
2    Jill         0          1            1      3      5
3    John         1          1            2      6     10
4  Meryll         0          1            1      1      3
5   Steve         0          0            0      1      1
6     Tim         1          0            0      2      3
7     Tom         0          0            1      0      1

^{}^{}

s = df.explode('people')
s = pd.crosstab(s['people'], s['statusName'])
s['Total'] = s.sum(1)

print(s)

statusName  Assigned  Completed  In Progress  To Do  Total
people                                                    
Amanda             0          0            0      1      1
Bill               0          1            0      0      1
Jill               0          1            1      3      5
John               1          1            2      6     10
Meryll             0          1            1      1      3
Steve              0          0            0      1      1
Tim                1          0            0      2      3
Tom                0          0            1      0      1

您可以分解人员列和交叉表: 首先,将“人员”列更改为列表,如下所示:

df['people']=df['people'].str.replace('[', '').str.replace(']', '')
df['people']=df['people'].str.split(',')

现在应用分解和交叉表功能:

df2=df.explode('people')
res = pd.crosstab(df2['people'], df2['statusName'])
res['Total']=res['Assigned'] + res['Completed'] + res['In Progress'] + res['To Do']
    
res=res[['Total', 'Assigned', 'Completed', 'In Progress', 'To Do']]
res.columns.name = None
res.reset_index(inplace=True)

>>> print(res)
   people  Total  Assigned  Completed  In Progress  To Do
0  Amanda      1         0          0            0      1
1    Bill      1         0          1            0      0
2    Jill      5         0          1            1      3
3    John     10         1          1            2      6
4  Meryll      3         0          1            1      1
5   Steve      1         0          0            0      1
6     Tim      3         1          0            0      2
7     Tom      1         0          0            1      0

相关问题 更多 >

    热门问题