了解如何按日期选择事件并创建新的有序数据框。外科病人

ID OP_code OPDATE_01 1 xxx V259 2014-12-12 2 xxx A082 2014-06-23 3 999 V011 2014-08-07 4 xxx A023 2014-09-12 ... ... ... ... 473231 xxx A651 2018-10-03 473233 999 V014 2018-07-06 473235 xxx A263 2018-05-18

display(df_implants) OPDATE_01 OPERTN_01 ENCRYPTED_HESID 1111 [2019-01-26] [V011] 1112 [2019-01-22] [V011] 1113 [2015-09-24] [V011] 1114 [2016-06-21, 2017-02-27] [V011, V014] 1115 [2018-12-27] [V011] ... ... ... 3046 [2017-02-18] [V011] 3047 [2013-06-08] [V011]

1条回答

网友

1楼 · 发布于 2024-09-28 01:25:50

编辑：我已将下面的筛选条件更改为至少两个不同的操作

这里有一种方法可以做到这一点。出于测试目的，我对您的数据做了一些更改

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 999, 3, 1, 999, 2],
                   'OP_code': ['V011', 'A082', 'V011', 'V011', 'A651', 'V014', 'A263'], 
                   'OP_date': ['2014-12-12', '2014-06-23', '2014-08-07', '2014-09-12', 
                               '2018-10-03', '2018-07-06', '2018-05-18']})
df.set_index('ID', inplace=True)
display(df)

   OP_code     OP_date
ID      
1    V011   2014-12-12
2    A082   2014-06-23
999  V011   2014-08-07
3    V011   2014-09-12
1    A651   2018-10-03
999  V014   2018-07-06
2    A263   2018-05-18

首先，我们应该转换数据，以便每个患者只有一行，从列表中的多个OPs收集数据：

df_patients = pd.pivot_table(df, index=df.index, aggfunc=list)
display(df_patients)

     OP_code        OP_date
ID      
1    [V011, A651]   [2014-12-12, 2018-10-03]
2    [A082, A263]   [2014-06-23, 2018-05-18]
3    [V011]         [2014-09-12]
999  [V011, V014]   [2014-08-07, 2018-07-06]

现在给出一个与您感兴趣的植入物对应的操作代码列表，我们可以循环此数据框的行，以创建一个索引，仅包含至少有两个不同操作感兴趣的患者。然后我们可以根据这个新的索引过滤数据

implant_codes = {'V011', 'V014'}

implant_index = []
for i in df_patients.index:
    """EDIT: filter criterion tightened to at least two different 
       relevant OPs, i.e. the intersection of the implant_codes 
       list with the patient's OP list has at least two elements."""
    if len(implant_codes.intersection(df_patients.OP_code[i])) >= 2: 
        implant_index.append(i)

df_implants = df_patients.filter(implant_index, axis=0)
display(df_implants)

     OP_code       OP_date
ID      
999  [V011, V014]  [2014-08-07, 2018-07-06]

您可以通过数据帧和列表的索引语法组合访问此处的数据元素，例如df_implants.loc[999, 'OP_date'][0]生成患者999的第一个操作日期：'2014-08-07'

我不建议为每个OP创建单独的专栏。您可以尝试以下方法：

df_implants[['OP_date_1', 'OP_date_2']] = pd.DataFrame(df_implants.OP_date.values.tolist(), 
                                                       index=df_implants.index)
display(df_implants)

     OP_code       OP_date                   OP_date_1   OP_date_2
ID              
999  [V011, V014]  [2014-08-07, 2018-07-06]  2014-08-07  2018-07-06

然而，这种方法在实践中会遇到麻烦，因为不同患者的老年退休金计划数量不同。这就是为什么我认为上面给出的列表表示更自然、更容易处理的原因

相关问题更多 >

编程相关推荐

热门问题

热门文章