擅长:python、mysql、java
<p>您可以使用函数,然后将其应用于数据帧:</p>
<pre><code>text = 'There was a need to place a clip in the oesophagus. One biopsy was taken. There is a long duodenum. The stomach had a balloon placed'
patternAnatomy = "oesophagus|stomach|duodenum"
patternEvent = "clip|RFA|balloon|biopsy"
def split_text(text, patternAnatomy, patternEvent):
s = [sentence.split() for sentence in text.split('.')]
ana = patternAnatomy.split('|')
eve = patternEvent.split('|')
whitelist = ana + eve
l = list()
for sentence in s:
l_ana = list()
l_eve = list()
for word in sentence:
if word in ana:
l_ana.append(word)
if word in eve:
l_eve.append(word)
l.append([l_ana, l_eve])
return ['_'.join(tup[0])+':'+'_'.join(tup[1]) for tup in l]
split_text(text, patternAnatomy, patternEvent)
# Out[14]: ['oesophagus:clip', ':biopsy', 'duodenum:', 'stomach:balloon']
</code></pre>
<p>最好提供s、ana、eve和白名单变量作为参数,而不是每次都计算它们</p>