我有一个27列的Excel文件。我有一个名为“Users”的列,它有多个元素。在Excel中,每个元素在同一单元格的新行上分隔,每个子元素用分号(;)分隔在圆括号(())之间。但是,括号也可能存在于子元素中。下面使用示例数据显示了表格在Excel中的显示方式
下面是它如何使用Pandas作为数据帧导入Python
df = pd.DataFrame({'CN ON': ['WB-01','ZD-DD','DE-02','WZ-D8','HJ-78'],
'Type': ['First','Second','First','Second','Third'],
'Status': ['Completed','Started','Started','Final','Pending'],
'User': ['Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)','Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)','Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin (THUMB TACK; thumbtack@domain.com; 999-999-999\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)', 'Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin (THUMB TACK; thumbtack@domain.com; 999-999-999)\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)', 'Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin (THUMB TACK; thumbtack@domain.com; 999-999-999)\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nAdmin Assistant (WRIST PAD; wristpad@domain.com; 999 999 9999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)']
})
逻辑
现在,我想应用以下逻辑并创建一个名为ownership的新列。
如果Status = 'Completed'
那么Ownership = 'Completed'
如果Status = 'Started'
,则Ownership =
每个Admin
的名称。
如果Status = 'Final'
,那么Ownership =
每个Supervisor
的名称。
如果Status = 'Pending'
,则Ownership =
每个Admin Assistant
的名称
可能的用户角色是'Admin', 'Admin Assistant', 'Supervisor',
和'Alternative Supervisor'
。括号内的第一个子元素是该角色中的人员的姓名。第二个子元素是电子邮件地址。第三个子元素是非标准化电话号码。它可以有破折号,括号,空格,或者全部加在一起。子元素中的分隔符是分号;
。我相信元素之间的分隔符在Python中是\n
,因为当我导入数据帧(使用上面的脚本)时,它就是这样显示的
Status User Role
Started Admin
Pending Admin Assistant
Final Supervisor
Completed Completed
所需结果的Python脚本
df_results = pd.DataFrame({'CN ON': ['WB-01','ZD-DD','DE-02','WZ-D8','HJ-78'],
'Type': ['First','Second','First','Second','Third'],
'Status': ['Completed','Started','Started','Final','Pending'],
'User': ['Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)','Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)','Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin (THUMB TACK; thumbtack@domain.com; 999-999-999\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)', 'Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin (THUMB TACK; thumbtack@domain.com; 999-999-999)\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)', 'Admin (PAPER CLIP; paper.clip@domain.com; 999999999)\nAdmin (THUMB TACK; thumbtack@domain.com; 999-999-999)\nAdmin Assistant (MOUSE PAD; mousepad@domain.com; (999) 999999)\nAdmin Assistant (WRIST PAD; wristpad@domain.com; 999 999 9999)\nSupervisor (WHITE BOARD; whiteboard@domain.com; 999-999999)\nAlternative Supervisor (CHALK BOARD; chalkboard@domain.com; (999) 999-999)'],
'Ownership': ['Completed','PAPER CLIP','PAPER CLIP, THUMB TACK','WHITE BOARD','MOUSE PAD, WRIST PAD']
})
我尝试使用.split
函数,但是我不知道如何使用多个元素和多个分隔符进行拆分,特别是当可能存在多个括号时。然后,我不知道如何仅从该字段中提取某些元素,因为根据另一个字段的条件,可能会有多个实例
任何指导或帮助将不胜感激!如果我需要澄清什么,请告诉我
第一个任务是提取所有角色的名称,另一个任务很简单:
输出(
df['Status']
):注意我们可以使用
np.select
代替df['Status'].map
:相关问题 更多 >
编程相关推荐