如何在python中查找和匹配来自不同数据帧的特定值

2024-09-30 20:20:19 发布

您现在位置:Python中文网/ 问答频道 /正文

当我想在同一数据帧中的该列之后创建新列时,出现了一个问题:

print(df["Title"])
0                                           Others
1                                           Others
2                          Some major design flaws
3                                 My favorite buy!
4                                 Flattering shirt
5                          Not for the very petite
6                             Cagrcoal shimmer fun
7             Shimmer, surprisingly goes with lots
8                                       Flattering
9                                Such a fun dress!
10    Dress looks like it's made of cheap material
Name: Title, dtype: object

我试着从句子中找出积极的评价,下面是我得到的:

title_words=df["Title"].str.strip().str.lower().replace("","").str.strip('!.,?').str.split(expand=True).stack()

#print(words)
pos_title=title_words[title_words.isin(df2["Words"])]
print(pos_title)

结果如下:

print(pos_title)
3   1      favorite
4   0    flattering
6   2           fun
8   0    flattering
9   2           fun
10  2          like
dtype: object

目标是创建一个新列,将第一列“Title”的索引号与pos_Title的结果相匹配。如果有一个句子不包含肯定词(本例中为pos_title),则应保留“None”

这是预期输出:

enter image description here

我应该如何编码?非常感谢您的帮助,因为我是Python新手。非常感谢


Tags: posdfobjecttitlefavoritelike句子words
1条回答
网友
1楼 · 发布于 2024-09-30 20:20:19

一种选择是检入原始数据帧,而不是将其拆分

import pandas as pd
import re
titles =  {'Title': [
    'Others',
    'Others',
    'Some major design flaws',
    'My favorite buy!',
    'Flattering shirt',
    'Not for the very petite',
    'Cagrcoal shimmer fun',
    'Shimmer, surprisingly goes with lots',
    'Flattering',
    'Such a fun dress!',
    'Dress looks like it\'s made of cheap material'
]}

pos_words = {"Words":[
    'favorite',
    'flattering',
    'fun',
    'like']
    }
df = pd.DataFrame(titles)

df2 = pd.DataFrame(pos_words)

pos_words = list(df2["Words"])
df['positive'] = (
        df.Title.str.
        findall('|'.join(pos_words), flags=re.IGNORECASE)
        )

这将返回如下所示的数据帧:

       |                                Title         | positive     |
0      |                                   Others     |        []    |
1      |                                  Others      |          []  |
2      |                 Some major design flaws      |          []  |
3      |                        My favorite buy!      |  [favorite]  |
4      |                        Flattering shirt      | [Flattering] |
5      |                  Not for the very petite     |          []  |
6      |                     Cagrcoal shimmer fun     |       [fun]  |
7      |     Shimmer, surprisingly goes with lots     |          []  |
8      |                               Flattering     | [Flattering] |
9      |                        Such a fun dress!     |       [fun]  |
10     | Dress looks like it's made of cheap material |   [like]     |

findall()返回一系列匹配项,因此返回括号。如果您在一个strng中有多个匹配项,例如“我最喜欢的、讨人喜欢的、有趣的衬衫”,则该字符串将返回[最喜欢的、讨人喜欢的、有趣的、喜欢的]

如果附加astype('str')和replace函数,则可以删除可能不需要的括号和引号

df['positive'] = (
    df.Title.str
    .findall('|'.join(pos_words), flags=re.IGNORECASE)
    .astype(str)
    .replace('\]', "", regex=True)
    .replace('\,', "", regex=True)
    .replace('\'', "", regex=True)
    .replace('\[', "", regex=True)
    )

相关问题 更多 >