我有两个不同的数据框架。如何找到匹配并合并框架数据

df1 ================================== KEYWORD TICKET Burst of bit errors 89814 sync and stand-by reload 66246 Port sub-modules modelling 70946 wires stop passing traffic 60245 Ignore Net flow 59052 df2 ========================== TEXT_DATA Burst of bit errors due to stop passing traffic

import pandas as pd Standard_Data = pd.read_excel('bOOK2.xlsx',usecols=[0,1]) print(Standard_Data) #Standard_Data ================================== KEYWORD TICKET Burst of bit errors 89814 sync and stand-by reload 66246 Port sub-modules modelling 70946 wires stop passing traffic 60245 Ignore Net flow 59052 keyword_data = Standard_Data['KEYWORD'].values.tolist() input_data = pd.read_excel('book1.xlsx',usecols=[1]) print(input_data) input_data ========================== TEXT_DATA Burst of bit errors due to stop passing traffic #simply df1 = Standard_Data , df2 = input_Data sentenced_data = input_data['Text_Data'].values.tolist() df = pd.DataFrame({'sentenced_data':sentenced_data}) print(df) df['MATCHED_KEYWORD'] = (df['sentenced_data'].apply(lambda x: [w for i in keyword_data for w in i.split(' ') if w in (x)])) df['KEYWORD'] = df['MATCHED_KEYWORD'].apply(','.join) df['KEYWORD'] = df['KEYWORD'].str.replace(',',' ') Z = Standard_Data.merge(df,on='KEYWORD',how='right') print(Z)

KEYWORD sentenced_data TICKET Burst of bit errors Burst of bit errors due to 89814 wires stop passing traffic stop passing traffic 66246

1条回答

网友

1楼 · 发布于 2024-06-25 23:16:30

请尝试以下代码：

df是第一个数据帧，df1是第二个数据帧

res = pd.DataFrame()
for text in df1.TEXT_DATA:
        res = res.append(
              df[df.apply(lambda row: row.KEYWORD in text or
                                    text in row.KEYWORD, axis=1)]
              )
print(res)

输出：

                      KEYWORD TICKET
0         Burst of bit errors  89814
3  wires stop passing traffic  60245

下面是另一种方法，可以使与预期输出完全相同：
res = pd.DataFrame(columns=['KEYWORD', 'TICKET', 'sentenced_data']) # create an empty dataframe to store the answer for text in df1.TEXT_DATA: # loop-through the second dataframe bools = df.apply(lambda row: row.KEYWORD in text or text in row.KEYWORD, axis=1) # return a boolean series if KEYWORD contains(by the "in" keyword in python) text or text contains KEYWORD if (bools.any()): # filter the df by the boolean series, append it to res, append the text to second column res = res.append(df[bools]) res.iloc[-bools.sum():, 2] = text res = res[['KEYWORD', 'sentenced_data', 'TICKET']] print(res)
输出：
KEYWORD sentenced_data TICKET 0 Burst of bit errors Burst of bit errors due to 89814 3 wires stop passing traffic stop passing traffic 60245
^如果两个string部分匹配或100%匹配，python中的{}关键字将返回True；否则返回False。

相关问题更多 >

编程相关推荐

热门问题

热门文章