在保留顺序的同时使用Pandas删除重复项[python]问题的回答

在保留顺序的同时使用Pandas删除重复项[python]

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

假设： <ul> <li>将不会删除单词containgin<code>-</code></李> </ul> 一些想法： <ul> <li>区分大小写的副本：在区分大小写的IMO中应该是，因此与<code>.lower()</code>比较</李> <li>保留第一个事件：删除其他事件</李> <li>用“，”分隔的单词或它们之间包含“-”：如果存在<code>-</code>则拆分单词，然后剥离<code>,</code>进行比较</li> </ul> <pre><code>import re import itertools sentences = [ '3sprouts Cesto de Roupa Cisne Sprouts, 3Sprouts, Organizador', 'Bright-Starts Mordedor Chocalho Rattle & Teethe, bright Starts, Rosa/Roxo', 'Bright-Starts Mordedor Twist & Teethe, Starts, Multicor' ] for s in sentences: s_split = s.split(' ') #keep original sentence split by ' ' s_split_without_comma = [i.strip(',') for i in s_split] #get compare word split by '-' and ' ', use re or itertools #method 1: re compare_words = re.split(' |-', s) #method 2: itertools compare_words = list(itertools.chain.from_iterable([i.split('-') for i in s_split])) #method 3: DIY compare_words = [] for i in s_split: compare_words += i.split('-') # strip ',' compare_words_without_comma = [i.strip(',') for i in compare_words] # start to compare need_removed_index = [] for word in compare_words_without_comma: matched_indexes = [] for idx, w in enumerate(s_split_without_comma): if word.lower() in w.lower().split('-'): matched_indexes.append(idx) if len(matched_indexes) >1: #has_duplicates need_removed_index += matched_indexes[1:] need_removed_index = list(set(need_removed_index)) # keep remain and join with ' ' print(" ".join([i for idx, i in enumerate(s_split) if idx not in need_removed_index])) </code></pre> 应打印： <pre><code>3sprouts Cesto de Roupa Cisne Sprouts, Organizador Bright-Starts Mordedor Chocalho Rattle & Teethe, Rosa/Roxo Bright-Starts Mordedor Twist & Teethe, Multicor </code></pre> 与答案相比，它有点不同，但我仍然不明白为什么第1行中也删除了<code>Sprouts</code>（'3sprouts'匹配'sprouts'？） 没关系。。。请给出一些概念 仅供参考

在保留顺序的同时使用Pandas删除重复项[python]

1 个回答

相关Python问题