Pandas:根据oth的值删除行

2024-09-28 03:15:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据框,列名为'ID'、'datetime'、'medicing'。现在,我正在努力消除所有的吗啡药物行与美沙酮药物给时间。你知道吗

例如:

ID-66531在2013-02-19 20:54接受了第一次美沙酮治疗。因为美沙酮和吗啡都是止痛药,所以这两种药物的效果是一样的。因此,如果美沙酮在吗啡治疗的24小时内给每个ID组服用,我试图消除吗啡治疗的排程。因此,对于ID-66531,如果任何吗啡在美沙酮24小时给药时间范围内,即2013-02-18 20:53和2013-02-19 20:54之间,我想删除每个ID组的记录

实际数据帧:

64347   2012-02-15 19:24:00 morphine
64347   2012-02-15 22:54:00 morphine
64347   2012-02-16 05:50:00 morphine
64347   2012-02-16 12:47:00 oxyCODONE
64347   2012-10-20 05:03:00 morphine
64347   2012-10-21 15:10:00 morphine
64347   2012-10-21 19:06:00 morphine
64347   2012-11-16 16:00:00 morphine
64990   2012-09-21 07:10:00 oxyCODONE
64990   2012-09-21 16:45:00 fentaNYL
64990   2012-09-21 23:21:00 oxyCODONE
64990   2012-09-22 15:40:00 oxyCODONE
66531   2012-12-18 08:31:00 acetaminophen-HYDROcodone
66531   2012-12-18 17:44:00 acetaminophen-HYDROcodone
66531   2013-02-18 22:36:00 morphine
66531   2013-02-19 05:07:00 morphine
66531   2013-02-19 10:13:00 morphine
66531   2013-02-19 15:27:00 morphine
66531   2013-02-19 17:33:00 oxyCODONE
66531   2013-02-19 19:20:00 acetaminophen-HYDROcodone
66531   2013-02-19 20:54:00 methadone
66531   2013-02-20 06:46:00 methadone
66531   2013-02-20 13:21:00 methadone
66531   2013-02-20 22:18:00 methadone

预期数据帧:

64347   2012-02-15 19:24:00 morphine
64347   2012-02-15 22:54:00 morphine
64347   2012-02-16 05:50:00 morphine
64347   2012-02-16 12:47:00 oxyCODONE
64347   2012-10-20 05:03:00 morphine
64347   2012-10-21 15:10:00 morphine
64347   2012-10-21 19:06:00 morphine
64347   2012-11-16 16:00:00 morphine
64990   2012-09-21 07:10:00 oxyCODONE
64990   2012-09-21 16:45:00 fentaNYL
64990   2012-09-21 23:21:00 oxyCODONE
64990   2012-09-22 15:40:00 oxyCODONE
66531   2012-12-18 08:31:00 acetaminophen-HYDROcodone
66531   2012-12-18 17:44:00 acetaminophen-HYDROcodone
66531   2013-02-19 17:33:00 oxyCODONE
66531   2013-02-19 19:20:00 acetaminophen-HYDROcodone
66531   2013-02-19 20:54:00 methadone
66531   2013-02-20 06:46:00 methadone
66531   2013-02-20 13:21:00 methadone
66531   2013-02-20 22:18:00 methadone

我尝试的:

我创建了一个timediff列,它获取两个记录之间的时间差。你知道吗

df['timediff'] = df.sort_values(by= 'datetime').groupby('ID')['datetime'].diff()

然后我给变量x分配了24小时

x = '24:00:00'并尝试使用以下代码删除吗啡

df = df.groupby('ID').apply(lambda x : x.loc[~(x['Medication'] == 'morphine')<x]

但是这个代码会删除吗啡记录,不管美沙酮的用药时间如何。你知道吗

如果我能得到一些帮助我会非常感激。你知道吗


Tags: 数据iddfdatetime记录时间药物小时
2条回答
>>> names = ['Id', 'Datetime', 'Drug']
>>> df = pd.read_csv('drugs.csv', names=names, parse_dates=['Datetime'])
>>> df

    Id      Datetime                Drug
0   66531   2012-12-18 08:31:00     acetaminophen-HYDROcodone
1   66531   2012-12-18 17:44:00     acetaminophen-HYDROcodone
2   66531   2013-02-18 22:36:00     morphine
3   66531   2013-02-19 05:07:00     morphine
4   66531   2013-02-19 10:13:00     morphine
5   66531   2013-02-19 15:27:00     morphine
6   66531   2013-02-19 17:33:00     oxyCODONE
7   66531   2013-02-19 19:20:00     acetaminophen-HYDROcodone
8   66531   2013-02-19 20:54:00     methadone
9   66531   2013-02-20 06:46:00     methadone
10  66531   2013-02-20 13:21:00     methadone
11  66531   2013-02-20 22:18:00     methadone

然后,遍历所有行组合。它是有效的,但不是最有效的解决办法。如果您的数据集非常大,您可能需要考虑更优雅的方法。你知道吗

drop_idx = set([])
for idx_early, row_early in df.iterrows():
    for idx_late, row_late in df.iterrows():
        if (row_early['Datetime'] + pd.DateOffset(hours=24) > row_late['Datetime'] and
            row_early['Datetime'] < row_late['Datetime'] and
            row_early['Drug'] == 'morphine' and
            row_late['Drug'] == 'methadone' and
            row_early['Id'] == row_late['Id']):
            drop_idx.add(idx_early)

现在,删除已标识的行并打印结果。你知道吗

>>> df.drop(index=drop_idx)

    Id      Datetime                Drug
0   66531   2012-12-18 08:31:00     acetaminophen-HYDROcodone
1   66531   2012-12-18 17:44:00     acetaminophen-HYDROcodone
6   66531   2013-02-19 17:33:00     oxyCODONE
7   66531   2013-02-19 19:20:00     acetaminophen-HYDROcodone
8   66531   2013-02-19 20:54:00     methadone
9   66531   2013-02-20 06:46:00     methadone
10  66531   2013-02-20 13:21:00     methadone
11  66531   2013-02-20 22:18:00     methadone

我想到的是:

import pandas
df = pandas.read_csv('test.tsv', header=None, sep='\s+',
                     parse_dates = [['date', 'time']],
                     names=['patient_id', 'date', 'time','drug'])

to_drop = set()
for _, patient in df.groupby('patient_id'):
    meth_times = patient.loc[patient['drug'] == 'methadone']['date_time']
    morph_doses = patient.loc[patient['drug'] == 'morphine']
    for i, md in morph_doses.iterrows():
        for mt in meth_times:
            days_elapsed = (mt - md['date_time']).days
            if days_elapsed < 1:
                to_drop.add(i)
                break

df = df[~df.index.isin(to_drop)]

与@ulmefors建议的答案一样,这是在吗啡/美沙酮的每一行上重复进行的,所以这有点低效,但是除非你使用一个巨大的数据集,否则它应该足够快地完成工作,这将是微不足道的。(但与另一个答案不同的是,它只是在每个患者的吗啡/美沙酮剂量内重复多次,而不是通过所有可能的行组合)

相关问题 更多 >

    热门问题