正在尝试从数据框中以逗号分隔的字符串中的日期列表中查找指定日期之前和之后最近的日期

ID_string object indexdate datetime64[ns] XR_count int64 CT_count int64 studyid_concat object studydate_concat object modality_concat object

ID_string indexdate XR_count CT_count studyid_concat studydate_concat 0 55555555 2020-09-07 10 1 ['St1', 'St5'...] ['06/22/2019', '09/20/2020'...] 1 66666666 2020-06-07 5 0 ['St11', 'St17'...] ['05/22/2020', '06/24/2020'...]

df = pd.read_excel(path_to_excel, sheet_name='Sheet1') # Convert comma separated string from Excel to lists of strings df.studyid_concat = df.studyid_concat.str.split(',') df.studydate_concat = df.studydate_concat.str.split(',') df.modality_concat = df.modality_concat.str.split(',') for x in in df['ID_string'].values: index_date = df.loc[df['ID_string'] == x, 'indexdate'] # Had to use subscript [0] below because result of above was a list in an array studyid_list = df.loc[df['ID_string'] == x, 'studyid_concat'].values[0] date_list = df.loc[df['ID_string'] == x, 'studydate_concat'].values[0] modality_list = df.loc[df['ID_string'] == x, 'modality_concat'].values[0] xr_date_list = [date_list[x] for x in range(len(date_list)) if modality_list[x]=="XR"] xr_studyid_list = [studyid_list[x] for x in range(len(studyid_list)) if modality_list[x]=="XR"]

ID_string indexdate StudyIDBefore StudyDateBefore 0 55555555 2020-09-07 ['St33', 'St1', ...] [2020-09-06, 2019-06-22, ...] 1 66666666 2020-06-07 ['St11', 'St2', ...] [2020-05-22, 2020-05-01, ...]

1条回答

网友
1楼 · 发布于 2024-10-01 02:34:47

我想我在花了一些时间思考并参考了更多关于熊猫的datetime文档后找到了自己的答案。基本上意识到我可以使用pd.to_datetime将我的字符串日期列表转换为
date_list = pd.to_datetime(df.loc[df['ID_string'] == x, 'studydate_concat'].values[0]).values
然后可以从这个列表中减去我的索引日期。我选择在一个临时的数据框中这样做，这样我就可以跟踪其他列的值（比如研究ID、模态等）
完整代码如下：
for x in df['ID_string'].values: index_date = df.loc[df['ID_string'] == x, 'indexdate'].values[0] date_list = pd.to_datetime(df.loc[df['ID_string'] == x, 'studydate_concat'].values[0]).values modality_list = df.loc[df['ID_string'] == x, 'modality_concat'].values[0] studyid_list = df.loc[df['ID_string'] == x, '_concat'].values[0] tempdata = list(zip(studyid_list, date_list, modality_list)) tempdf = pd.DataFrame(tempdata, columns=['studyid', 'studydate', 'modality']) tempdf['indexdate'] = index_date tempdf['timedelta'] = tempdf['studydate']-tempdf['index_date'] tempdf['study_done_wi_3daysbefore'] = np.where((tempdf['timedelta']>=np.timedelta64(-3,'D')) & (tempdf['timedelta']<np.timedelta64(0,'D')), True, False) tempdf['study_done_wi_3daysafter'] = np.where((tempdf['timedelta']<=np.timedelta64(3,'D')) & (tempdf['timedelta']>=np.timedelta64(0,'D')), True, False) tempdf['study_done_onindex'] = np.where(tempdf['timedelta']==np.timedelta64(0,'D'), True, False) XRonindex[x] = True if len(tempdf.loc[(tempdf['study_done_onindex']==True) & (tempdf['modality']=='XR'), 'studyid'])>0 else False XRwi3days[x] = True if len(tempdf.loc[(tempdf['study_done_wi_3daysbefore']==True) & (tempdf['modality']=='XR'), 'studyid'])>0 else False # can later map these values back to my original dataframe as a new column

相关问题更多 >

编程相关推荐

热门问题

热门文章