我有一个数据框,如下所示:
| people | statusName |
| -------------------- | ----------- |
| [Steve] | To Do |
| [Jill, John] | To Do |
| [Jill, John] | To Do |
| [Jill, John] | Completed |
| [Amanda, John] | To Do |
| [Meryll, Jill, John] | To Do |
| [Meryll, Jill, John] | In Progress |
| [Meryll, Bill] | Completed |
| [John, Tim] | To Do |
| [John, Tim] | To Do |
| [John, Tim] | Assigned |
| [John, Tom] | In Progress |
因此,第一列是列表类型。我想根据每个人的不同状态名称对其进行排序。因此,所需的数据帧如下所示:
| people | Total | To Do | In Progress | Completed | Stopped |
|--------|-------|-------|-------------|-----------|---------|
| Steve | 1 | 1 | 0 | 0 | 0 |
| Jill | 5 | 3 | 1 | 1 | 0 |
| John | 6 | 4 | 1 | 1 | 0 |
| Amanda | 1 | 1 | 0 | 0 | 0 |
| Meryll | 3 | 1 | 1 | 1 | 0 |
| Bill | 1 | 0 | 0 | 1 | 0 |
| Tim | 3 | 2 | 0 | 0 | 1 |
| Tom | 1 | 0 | 1 | 0 | 0 |
所以基本上我想要的是当people列是一个字符串而不是不同人名的列表类型时crosstab
函数如何工作
如何使用dataframe实现相同的功能?或在这种情况下适用的任何方法
数据帧:
df = pd.DataFrame({'people': {0: ['Steve'],
1: ['Jill', 'John'],
2: ['Jill', 'John'],
3: ['Jill', 'John'],
4: ['Amanda', 'John'],
5: ['Meryll', 'Jill', 'John'],
6: ['Meryll', 'Jill', 'John'],
7: ['Meryll', 'Bill'],
8: ['John', 'Tim'],
9: ['John', 'Tim'],
10: ['John', 'Tim'],
11: ['John', 'Tom']},
'statusName': {0: 'To Do',
1: 'To Do',
2: 'To Do',
3: 'Completed',
4: 'To Do',
5: 'To Do',
6: 'In Progress',
7: 'Completed',
8: 'To Do',
9: 'To Do',
10: 'Assigned',
11: 'In Progress'}})
使用
explode
和get_dummies
,然后使用groupby
和sum
:输出:
^{} 和^{}
您可以分解人员列和交叉表: 首先,将“人员”列更改为列表,如下所示:
现在应用分解和交叉表功能:
相关问题 更多 >
编程相关推荐