从字符串列表中，创建新列表，其中每个项指示原始列表中的相应项是否位于两个特定条目之间

jay = ['Despite', 'similar', 'intensity', 'of', 'alcohol', '<Disease:D013375>', 'withdrawal', 'symptoms', '</Disease:D013375>', ',', 'ALC', '/', 'COC', 'subjects', 'received', 'less', 'oxazepam', 'to', 'treat', 'alcohol', '<Disease:D013375>', 'withdrawal', 'symptoms', '</Disease:D013375>', 'compared', 'to', 'ALC', 'subjects', '.']

wow = jay labs = [] for i in range(0, len(wow)): if wow[i].startswith("<Disease"): labs.append('DelStrB') elif i>0 and i<=len(labs): if labs[i-1] == 'DelStrB': labs.append('B-COL') i = i + 1 while not (wow[i].startswith("</Disease")): labs.append('I-COL') i = i + 1 if wow[i].startswith("</Disease"): labs.append('DelStrE') i = i + 1 elif wow[i].startswith("</Disease"): k=9 #do nothing else: labs.append('O') elif wow[i].startswith("</Disease"): k=9 #do nothing else: labs.append('O') labs[:] = [x for x in labs if x != 'DelStrB'] labs[:] = [x for x in labs if x != 'DelStrE'] print(labs)

3条回答

网友

1楼 · 编辑于 2024-10-01 22:28:43

可以使用简单的生成器：

import re
jay = ['Despite', 'similar', 'intensity', 'of', 'alcohol', '<Disease:D013375>', 'withdrawal', 'symptoms', '</Disease:D013375>', ',', 'ALC', '/', 'COC', 'subjects', 'received', 'less', 'oxazepam', 'to', 'treat', 'alcohol', '<Disease:D013375>', 'withdrawal', 'symptoms', '</Disease:D013375>', 'compared', 'to', 'ALC', 'subjects', '.']
def results(d):
  _flag = -1
  for i in d:
    if re.findall('\<Disease:\w+\>', i):
      _flag = 1
    elif re.findall('\</Disease:\w+\>', i):
      _flag = -1
    else:
      if _flag == -1:
        yield 'O'
      elif _flag == 1:
        yield 'B-COL'
        _flag = 0
      else:
        yield 'I-COL'

print(list(results(jay)))

输出：

['O', 'O', 'O', 'O', 'O', 'B-COL', 'I-COL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-COL', 'I-COL', 'O', 'O', 'O', 'O', 'O']

网友

2楼 · 编辑于 2024-10-01 22:28:43

使用迭代方法的解决方案：

jay = ['Despite', 'similar', 'intensity', 'of', 'alcohol', '<Disease:D013375>', 'withdrawal', 'symptoms', '</Disease:D013375>', ',', 'ALC', '/', 'COC', 'subjects', 'received', 'less', 'oxazepam', 'to', 'treat', 'alcohol', '<Disease:D013375>', 'withdrawal', 'symptoms', '</Disease:D013375>', 'compared', 'to', 'ALC', 'subjects', '.']

result = []

inside = False
seen_BCOL = False

for i in range(len(jay)):
    if jay[i].startswith('<Disease'):
        inside = True
    elif jay[i].startswith('</Disease'):
        inside = False
        seen_BCOL = False
    elif inside == True:
        if seen_BCOL == False:
            result.append('B-COL')
            seen_BCOL = True
        else:
            result.append('I-COL')
    elif inside == False:
        result.append('O')


print(result)

['0', '0', '0', '0', '0', '0', 'B-COL', 'I-COL', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', 'B-COL', 'I-COL', '0', '0', '0', '0', '0', '0']

网友

3楼 · 编辑于 2024-10-01 22:28:43

您可以使用itertools.groupby和一个查找“疾病”项的键函数，将列表分成奇数和偶数两组，以便使用不同的标记方法：

import re
from itertools import groupby
[t for i, l in enumerate(list(g) for k, g in groupby(jay, key=lambda s: re.match(r'</?Disease:\w+>', s)) if not k) for t in (('B-COL',) + ('I-COL',) * (len(l) - 1) if i % 2 else ('O',) * len(l))]

这将返回：

['O', 'O', 'O', 'O', 'O', 'B-COL', 'I-COL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-COL', 'I-COL', 'O', 'O', 'O', 'O', 'O']

请注意，预期的输出是不正确的，因为它在'B-COL'和'I-COL'的两个序列之间还有2个'O'

相关问题更多 >

编程相关推荐

热门问题

热门文章