如何根据Pandas中的一个条件映射两行不同的数据帧

2024-06-16 14:13:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧

df1型

 Names
 one two three
 Sri is a good player
 Ravi is a mentor
 Kumar is a cricketer player

df2型

 values
 sri
 NaN
 sri, is
 kumar,cricketer player

我正在尝试获取df1中包含df2中所有项的行

我的预期产出是

 values                  Names
 sri                     Sri is a good player
 NaN
 sri, is                 Sri is a good player
 kumar,cricketer player  Kumar is a cricketer player

我试过,df1["Names"].str.contains("|".join(df2["values"].values.tolist())) 我也试过了

但我无法实现预期的输出,因为它已经(“,”)。请帮忙


Tags: 数据namesisnanonedf1goodplayer
2条回答

在Numpy广播中使用set逻辑。你知道吗

d1 = df1['Names'].fillna('').str.lower().str.split('[^a-z]+').apply(set).values
d2 = df2['values'].fillna('').str.lower().str.split('[^a-z]+').apply(set).values

i, j = np.where(d1 >= d2[:, None])

df2.assign(Names=pd.Series(df1['Names'].values[j], df2['values'].index[i]))

                   values                        Names
0                     sri         Sri is a good player
1                     NaN                          NaN
2                 sri, is         Sri is a good player
3  kumar,cricketer player  Kumar is a cricketer player

试试看-

import pandas as pd

df1 = pd.read_csv('sample.csv')
df2 = pd.read_csv('sample_2.csv')

df2['values']= df2['values'].str.lower()
df1['names']= df1['names'].str.lower()

df2["values"] = df2['values'].str.replace('[^\w\s]',' ')
df2['values']= df2['values'].replace('\s+', ' ', regex=True)

df1["names"] = df1['names'].str.replace('[^\w\s]',' ')
df1['names']= df1['names'].replace('\s+', ' ', regex=True)

df2['list_values'] = df2['values'].apply(lambda x: str(x).split())
df1['list_names'] = df1['names'].apply(lambda x: str(x).split())

list_names = df1['list_names'].tolist()

def check_names(x, list_names):
    output = ''
    for list_name in list_names:
        if set(list_name) >= set(x):
            output = ' '.join(list_name)
            break
    return output

df2['Names'] = df2['list_values'].apply(lambda x: check_names(x, list_names))
print(df2)

输出

values                        Names
0                     sri         sri is a good player
1                     NaN                             
2                  sri is         sri is a good player
3  kumar cricketer player  kumar is a cricketer player

检查

这是一个模糊匹配问题。以下是我应用的步骤-

  1. 删除标点并拆分以获得两个df上的唯一单词
  2. 小写所有的标准化匹配。你知道吗
  3. 通过将字符串拆分为列表进行转换。你知道吗
  4. 最后通过check_names()函数进行匹配以获得所需的输出

相关问题 更多 >