<p>假设:</p>
<ul>
<li>将不会删除单词containgin<code>-</code></李>
</ul>
<p>一些想法:</p>
<ul>
<li>区分大小写的副本:在区分大小写的IMO中应该是<b>,因此与<code>.lower()</code>比较</李>
<li>保留第一个事件:删除其他事件</李>
<li>用“,”分隔的单词或它们之间包含“-”:如果存在<code>-</code>则拆分单词,然后剥离<code>,</code>进行比较</li>
</ul>
<pre><code>import re
import itertools
sentences = [
'3sprouts Cesto de Roupa Cisne Sprouts, 3Sprouts, Organizador',
'Bright-Starts Mordedor Chocalho Rattle & Teethe, bright Starts, Rosa/Roxo',
'Bright-Starts Mordedor Twist & Teethe, Starts, Multicor'
]
for s in sentences:
s_split = s.split(' ') #keep original sentence split by ' '
s_split_without_comma = [i.strip(',') for i in s_split]
#get compare word split by '-' and ' ', use re or itertools
#method 1: re
compare_words = re.split(' |-', s)
#method 2: itertools
compare_words = list(itertools.chain.from_iterable([i.split('-') for i in s_split]))
#method 3: DIY
compare_words = []
for i in s_split:
compare_words += i.split('-')
# strip ','
compare_words_without_comma = [i.strip(',') for i in compare_words]
# start to compare
need_removed_index = []
for word in compare_words_without_comma:
matched_indexes = []
for idx, w in enumerate(s_split_without_comma):
if word.lower() in w.lower().split('-'):
matched_indexes.append(idx)
if len(matched_indexes) >1: #has_duplicates
need_removed_index += matched_indexes[1:]
need_removed_index = list(set(need_removed_index))
# keep remain and join with ' '
print(" ".join([i for idx, i in enumerate(s_split) if idx not in need_removed_index]))
</code></pre>
<p>应打印:</p>
<pre><code>3sprouts Cesto de Roupa Cisne Sprouts, Organizador
Bright-Starts Mordedor Chocalho Rattle & Teethe, Rosa/Roxo
Bright-Starts Mordedor Twist & Teethe, Multicor
</code></pre>
<p>与答案相比,它有点不同,但我仍然不明白为什么第1行中也删除了<code>Sprouts</code>('3sprouts'匹配'sprouts'?)</p>
<p>没关系。。。请给出一些概念</p>
<p>仅供参考</p>