以下是我试图实现的目标:我希望在df1和df2中找到匹配的“记录ID”,当找到匹配项时,在df1中创建一个新列“new Record ID”,并将“new Record ID”从df2复制到具有匹配“Record ID”的行中
df1
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| | ID | Start time | Completion time | Email | Name | Client Name | Date Due | Narrative | Role | Name ID | Role ID | Record ID |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 0 | 1 | 2020-05-07 19:35:13 | 2020-05-07 19:36:29 | a@a.com | John Smith | HPOT | 2020-05-01 | HPOT TS Narrative | Supervisor | JohnSmith | Supervisor | hpotjohnsmithsupervisor2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 1 | 2 | 2020-05-07 19:36:31 | 2020-05-07 19:37:23 | b@b.com | John Smith | SMAN | 2020-05-01 | SMAN IP Narrative | Line Worker | JohnSmith | LineWorker | smanjohnsmithlineworker2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 2 | 3 | 2020-05-07 19:37:25 | 2020-05-07 19:38:57 | c@c.com | John Smith | THUL | 2020-05-01 | THUL TS Narrative | Supervisor | JohnSmith | Supervisor | thuljohnsmithsupervisor2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 3 | 4 | 2020-05-07 19:38:59 | 2020-05-07 19:40:30 | d@d.com | John Smith | HPOT | 2020-05-01 | HPOT IP Narrative | Line Worker | JohnSmith | LineWorker | hpotjohnsmithlineworker2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 4 | 5 | 2020-05-07 19:40:32 | 2020-05-07 19:41:11 | e@e.com | John Smith | SMAN | 2020-05-01 | SMAN TS Narrative | Supervisor | JohnSmith | Supervisor | smanjohnsmithsupervisor2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
df2
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
| | ID | Start time | Completion time | Email | Name | Record ID | Client Name | Due Date | Revised Narrative | Role | New Record ID |
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
| 0 | 1 | 2020-05-09 | 2020-05-09 10:57:24 | a@a.com | John Smith | hpotjohnsmithsupervisor2020-05-01 | HPOT | 2020-05-01 | HPOT Revised Summary Narrative | Supervisor | hpotjohnsmithsupervisor2020-05-01-2020-05-06 |
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
| 1 | 2 | 2020-05-09 | 2020-05-07 19:38:57 | c@c.com | John Smith | thuljohnsmithsupervisor2020-05-01 | THUL | 2020-05-01 | THUL Revised Summary Narrative | Supervisor | thuljohnsmithsupervisor2020-05-01-2020-05-06 |
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
所需输出:df1
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| | ID | Start time | Completion time | Email | Name | Client Name | Date Due | Narrative | Role | Name ID | Role ID | Record ID | New Record ID |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 0 | 1 | 2020-05-07 19:35:13 | 2020-05-07 19:36:29 | a@a.com | John Smith | HPOT | 2020-05-01 | HPOT TS Narrative | Supervisor | JohnSmith | Supervisor | hpotjohnsmithsupervisor2020-05-01 | hpotjohnsmith2020-05-01-2020-05-06 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 1 | 2 | 2020-05-07 19:36:31 | 2020-05-07 19:37:23 | b@b.com | John Smith | SMAN | 2020-05-01 | SMAN IP Narrative | Line Worker | JohnSmith | LineWorker | smanjohnsmithlineworker2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 2 | 3 | 2020-05-07 19:37:25 | 2020-05-07 19:38:57 | c@c.com | John Smith | THUL | 2020-05-01 | THUL TS Narrative | Supervisor | JohnSmith | Supervisor | thuljohnsmithsupervisor2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 3 | 4 | 2020-05-07 19:38:59 | 2020-05-07 19:40:30 | d@d.com | John Smith | HPOT | 2020-05-01 | HPOT IP Narrative | Line Worker | JohnSmith | LineWorker | hpotjohnsmithlineworker2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 4 | 5 | 2020-05-07 19:40:32 | 2020-05-07 19:41:11 | e@e.com | John Smith | SMAN | 2020-05-01 | SMAN TS Narrative | Supervisor | JohnSmith | Supervisor | smanjohnsmithsupervisor2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
我试过使用isin和loc,但都没能让它工作。最近的尝试是:
df2.loc[df2['Record ID'] == df1['Record ID']]
但我得到以下错误:“ValueError:只能比较标记相同的系列对象”
提前感谢您提供的任何帮助
根据要求,以下是剪贴板中的数据: df1
ID,Start time,Completion time,Email,Name,Client Name,Date Due,Narrative,Role,Name ID,Role ID,Record ID
1,2020-05-07 19:35:13,2020-05-07 19:36:29,johns@abcdefcompany.com,John Smith,HPOT,2020-05-01,"HPOT TS Narrative
Sed pretium, nunc sit amet finibus finibus, sem tortor fringilla arcu, ac vehicula massa tellus id purus. Proin luctus sodales lacus sed tempus.",Supervisor,JohnSmith,Supervisor,hpotjohnsmithsupervisor2020-05-01
2,2020-05-07 19:36:31,2020-05-07 19:37:23,johns@abcdefcompany.com,John Smith,SMAN,2020-05-01,"SMAN IP Narrative - Sed pretium, nunc sit amet finibus finibus, sem tortor fringilla arcu, ac vehicula massa tellus id purus. Proin luctus sodales lacus sed tempus.
",Worker,JohnSmith,Worker,smanJohnSmithWorker2020-05-01
3,2020-05-07 19:37:25,2020-05-07 19:38:57,johns@abcdefcompany.com,John Smith,THUL,2020-05-01,"THUL TS Narrative
Entry 3 - Sed pretium, nunc sit amet finibus finibus, sem tortor fringilla arcu, ac vehicula massa tellus id purus. Proin luctus sodales lacus sed tempus..
",Supervisor,JohnSmith,Supervisor,thulJohnSmithSupervisor2020-05-01
4,2020-05-07 19:38:59,2020-05-07 19:40:30,johns@abcdefcompany.com,John Smith,HPOT,2020-05-01,HPOT IP Narrative,Worker,JohnSmith,Worker,hpotJohnSmithWorker2020-05-01
5,2020-05-07 19:40:32,2020-05-07 19:41:11,johns@abcdefcompany.com,John Smith,SMAN,2020-05-01,SMAN TS Narrative,Supervisor,JohnSmith,Supervisor,smanJohnSmithSupervisor2020-05-01
6,2020-05-07 19:41:12,2020-05-07 19:41:40,johns@abcdefcompany.com,John Smith,THUL,2020-05-01,THUL IP Narrative,Worker,JohnSmith,Worker,thulJohnSmithWorker2020-05-01
df2
ID,Start time,Completion time,Email,Name,Record ID,Client Name,Due Date,Revised Narrative,Role,New Record ID
1,2020-05-09,2020-05-09 10:57:24,johns@abcdefcompany.com,John Smith,hpotjohnsmithsupervisor2020-05-01,HPOT,2020-05-01,HPOT Revised Summary Narrative,Supervisor,hpotjohnsmithsupervisor2020-05-012020-05-09
如果有人试图回答我的问题,可以将其复制到剪贴板,然后
df = pd.read_clipboard(sep=',')
尝试:所以我尝试了“加入”建议:
df1.set_index('Record ID').join(df2.set_index('Record ID'))
出现此错误:“ValueError:列重叠但未指定后缀:索引(['ID'、'Start time'、'Completion time'、'Email'、'Name'、'Client Name', “角色”], dtype='object')”
使用pandas.DataFrame.merge
df1['Records ID']
和df2['Records ID']
之间没有匹配值,因为一列包含大写,而另一列不包含大写。必须解决此问题才能找到匹配项李>相关问题 更多 >
编程相关推荐