用Python中唯一的ID替换所有出现的字符串(给定其起始和结束索引)

2024-10-03 21:32:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在处理一个需要预处理的数据集。我想用它们唯一的id替换所有的事件(由起始索引和结束索引给出)。你知道吗

给定一个文本字符串,如:

s = "The hypotensive effect of 100 mg/kg alpha-methyldopa was also partially reversed by naloxone. Naloxone alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously hypertensive rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of [3H]-naloxone (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM)."

还有一系列字典,比如:

[

'D006973':[{'length':'12', '偏移量':'199', '文本':['高血压'], '类型':'疾病'}]

'D008750':[{'length':'16', '偏移量':'36', '文本':['α-甲基多巴'], '类型':'化学'}]

'D007022':[{'length':'11', '偏移量':'4', '文本':['降压'], '类型':'疾病'}]

'D009270':[{'length':'8', '偏移量':'84', '文本':['纳洛酮'], '类型':'化学'}, {'length':'8', '偏移量':'94', '文本':['纳洛酮'], '类型':'化学'}, {'length':'13', '偏移量':'293', '文本':[“[3H]-纳洛酮”], '类型':'化学'}]

]

我想用它们各自的id替换偏移量给出的所有引用。因此,对于最后一个字典,我希望列表中的所有值都替换为“D009270”。你知道吗

示例1:对于键为“D006973”的第一个字典,我想用“D006973”替换索引199中的长度为12的“dictionary”。你知道吗

示例2:对于键为“D009270”的最后一个字典,我想替换索引中的子字符串(由元组给出)

[(84, 92), (94, 102), (293, 306)]
  1. 在最后一句中,纳洛酮与“纳洛酮抑制的”一起出现,但我不想替换它。所以我不能简单地使用str.replace()

  2. 我用它的唯一ID替换了从起始索引到结束索引的字符串(例如:199到211表示'hyperative')。但这会干扰其他“有待替换”实体的偏移量。 当要替换的文本('D006973')小于字符串('D006973')时,可以使用填充。但当要重新绘制的文本的大小较大时,它将失败。


Tags: ofto字符串文本id类型字典not
1条回答
网友
1楼 · 发布于 2024-10-03 21:32:12

可以将字符串格式化程序与占位符字符一起使用:

from operator import itemgetter

s = "The hypotensive effect of 100 mg/kg alpha-methyldopa was also partially reversed by naloxone. Naloxone alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously hypertensive rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of [3H]-naloxone (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM)."

dictionary={
'D006973': [{'length': '12', 'offset': '199', 'text': ['hypertensive'], 'type': 'Disease'}],
'D008750': [{'length': '16', 'offset': '36', 'text': ['alpha-methyldopa'], 'type': 'Chemical'}],
'D007022': [{'length': '11', 'offset': '4', 'text': ['hypotensive'], 'type': 'Disease'}],
'D009270': [{'length': '8', 'offset': '84', 'text': ['naloxone'], 'type': 'Chemical'}, {'length': '8', 'offset': '94', 'text': ['Naloxone'], 'type': 'Chemical'}, {'length': '13', 'offset': '293', 'text': ["[3H]-naloxone"], 'type': 'Chemical'}]
}

index_list=[]
for key in dictionary:
    for dic in dictionary[key]:
        o=int(dic['offset'])
        index_tuple=o , o+int(dic['length']),key
        index_list.append(index_tuple)

index_list.sort(key=itemgetter(0))
format_list=[]
lt=list(s)
for i,j in enumerate(index_list):
    si=j[0]
    ei=j[1]
    lt[si:ei]=list("{}") + ["@"]*((ei-si)-2)
    format_list.append(j[2])

text = "".join(lt)
text = text.replace("@","")
text = text.format(*format_list)

结果:'The D007022 effect of 100 mg/kg D008750 was also partially reversed by D009270. D009270 alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously D006973 rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of D009270 (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM).'

相关问题 更多 >