Python会将id相同但值不同的术语附加到列表中吗?

2024-06-28 20:47:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv文件,其中有一般概念和相应的医学术语或短语。如何编写循环,以便将所有短语分组到相应的概念中?我对python不是很有经验,所以我不确定如何编写循环

id   concept           phrase
--------------------------------
1    general_history   H&P
1    general_history   history and physical
1    general_history   history physical
2    clinic_history    clinic history physical
2    clinic_history    outpatient h p
3    discharge         discharge summary
3    discharge         DCS

对于相同的概念术语(或相同的ID),如何将短语附加到列表中以获得如下内容:

var = [[general_history, ['history and physical', history physical]], 
       [clinic_history, ['clinic history physical', 'outpatient h p']], 
       [discharge, ['discharge summary', 'DCS']]]

Tags: and文件csv概念经验summaryhistory医学
3条回答

使用for循环to和defaultdict累积术语

import csv
from collections import defaultdict
var = defaultdict(list)
records = ...  # read csv with csv.DictReader
for row in records:
    concept = row.get('concept', None)
    if concept is None: continue
    phrase = row.get('phrase', None)
    if phrase is None: continue
    var[concept].append(phrase)
print(var)

假设您已经可以解析csv,下面是如何按照概念进行排序

from collections import defaultdict

concepts = defaultdict(list)

""" parse csv """

for row in csv:
    id, concept, phrase = row
    concepts[concept].append(phrase)

var = [[k, concepts[k]] for k in concepts.keys()]

var将包含以下内容:

[['general_history', ['history and physical', 'history physical']...]

甚至可能有用的是,如果您维护该字典的键,因为var看起来像这样:

{
  "general_history": [
    "history and physical",
    "history physical",
  ],
 ...
}

如果您使用的是熊猫,请尝试过滤。它应该是这样的:

new_dataframe = dataframe[dataframe['id'] == id]

然后,连接数据帧

final_df = pd.concat([new_dataframe1, new_dataframe2], axis = 0)

你也可以尝试在概念上做同样的事情

相关问题 更多 >