在pandas中，检查主字符串是否包含列表中的字符串，是否从主字符串中删除子字符串并将其添加到新列中

df1= A 0 Black Prada zebra leather Large 1 green Gucci striped Canvas small 2 blue Prada Monogram calf leather XL df2= color pattern material size 0 black zebra leather small 1 green striped canvas xl 2 yellow checkered calf leather medium 3 orange monogram 4 white plain 5 pinstripe

1条回答

网友

1楼 · 发布于 2024-06-14 10:52:49

更新

{{cdm>{1}你可能想从下面的列中找到。在

在这里，它检查search字符串中与df2列中的单词匹配的最大百分比。如果它满足某个要求的阈值，则将其删除。在

我已经测试过了，它正在工作，但是您可能需要使用一些正则表达式匹配。在

import pandas

def perc_match(src, s):
    '''Return percentage of words in s found in src'''
    # http://stackoverflow.com/a/26985301/943773
    import re
    s = ' | '.join([r'\b{}\b'.format(x) for x in s.split()])
    r = re.compile(s, flags=re.I | re.X)

    return len(r.findall(src))/len(src)


search = ['Black Prada zebra leather Large',
          'green Gucci striped Canvas small',
          'blue Prada Monogram calf leather XL']

d2 = {'color':['black', 'green', 'yellow', 'orange', 'white',''],
      'pattern':['zebra', 'striped', 'checkered', 'monogram', 'plain',
                 'pinstripe'],
      'material':['leather', 'canvas', 'calf leather','','',''],
      'size':['small', 'xl', 'medium','','','']}

df2 = pandas.DataFrame(d2)

# Strip whitespace and make all lower case
strip_lower = lambda x: x.strip().lower()
search = list(map(strip_lower, search))
df2 = df2.applymap(strip_lower)

# Combine all columns to single string for each row
df2['full_str'] = df2.apply(lambda row: ' '.join(row), axis=1)

# Min percent matching
min_thresh = 0.1

# Calculate the percentage match for each row of dataframe
rm_ind = list()
for i in range(len(search)):
    s = search[i]
    # If you want you could save these `perc_matches` for later
    perc_matches = df2['full_str'].apply(perc_match, args=(s,))
    # Mark for removal if above threshold
    if perc_matches.max() > min_thresh:
        rm_ind.append(i)

# Remove indices from `search`
for i in rm_ind:
    del search[i]

相关问题更多 >

编程相关推荐

热门问题

热门文章

在pandas中，检查主字符串是否包含列表中的字符串，是否从主字符串中删除子字符串并将其添加到新列中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >