如何在python中查找和匹配来自不同数据帧的特定值

print(df["Title"]) 0 Others 1 Others 2 Some major design flaws 3 My favorite buy! 4 Flattering shirt 5 Not for the very petite 6 Cagrcoal shimmer fun 7 Shimmer, surprisingly goes with lots 8 Flattering 9 Such a fun dress! 10 Dress looks like it's made of cheap material Name: Title, dtype: object

1条回答

网友

1楼 · 发布于 2024-09-30 20:20:19

一种选择是检入原始数据帧，而不是将其拆分

import pandas as pd
import re
titles =  {'Title': [
    'Others',
    'Others',
    'Some major design flaws',
    'My favorite buy!',
    'Flattering shirt',
    'Not for the very petite',
    'Cagrcoal shimmer fun',
    'Shimmer, surprisingly goes with lots',
    'Flattering',
    'Such a fun dress!',
    'Dress looks like it\'s made of cheap material'
]}

pos_words = {"Words":[
    'favorite',
    'flattering',
    'fun',
    'like']
    }
df = pd.DataFrame(titles)

df2 = pd.DataFrame(pos_words)

pos_words = list(df2["Words"])
df['positive'] = (
        df.Title.str.
        findall('|'.join(pos_words), flags=re.IGNORECASE)
        )

这将返回如下所示的数据帧：

       |                                Title         | positive     |
0      |                                   Others     |        []    |
1      |                                  Others      |          []  |
2      |                 Some major design flaws      |          []  |
3      |                        My favorite buy!      |  [favorite]  |
4      |                        Flattering shirt      | [Flattering] |
5      |                  Not for the very petite     |          []  |
6      |                     Cagrcoal shimmer fun     |       [fun]  |
7      |     Shimmer, surprisingly goes with lots     |          []  |
8      |                               Flattering     | [Flattering] |
9      |                        Such a fun dress!     |       [fun]  |
10     | Dress looks like it's made of cheap material |   [like]     |

findall（）返回一系列匹配项，因此返回括号。如果您在一个strng中有多个匹配项，例如“我最喜欢的、讨人喜欢的、有趣的衬衫”，则该字符串将返回[最喜欢的、讨人喜欢的、有趣的、喜欢的]

如果附加astype（'str'）和replace函数，则可以删除可能不需要的括号和引号

df['positive'] = (
    df.Title.str
    .findall('|'.join(pos_words), flags=re.IGNORECASE)
    .astype(str)
    .replace('\]', "", regex=True)
    .replace('\,', "", regex=True)
    .replace('\'', "", regex=True)
    .replace('\[', "", regex=True)
    )

相关问题更多 >

编程相关推荐

热门问题

热门文章