我有两个不同的数据框架。如何找到匹配并合并框架数据

2024-06-25 23:16:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个不同的数据帧,如图所示

     df1
     ==================================                              
     KEYWORD                     TICKET
     Burst of bit errors          89814
     sync and stand-by reload     66246
     Port sub-modules modelling   70946
     wires stop passing traffic   60245
     Ignore Net flow              59052


     df2
     ==========================         
     TEXT_DATA
     Burst of bit errors due to
     stop passing traffic

部分匹配。请帮我解决这个问题。这是我开发的一段代码

import pandas as pd

Standard_Data = pd.read_excel('bOOK2.xlsx',usecols=[0,1])

print(Standard_Data)

     #Standard_Data
     ==================================                              
     KEYWORD                     TICKET
     Burst of bit errors          89814
     sync and stand-by reload     66246
     Port sub-modules modelling   70946
     wires stop passing traffic   60245
     Ignore Net flow              59052

keyword_data = Standard_Data['KEYWORD'].values.tolist()

input_data = pd.read_excel('book1.xlsx',usecols=[1])

print(input_data)

     input_data
     ==========================         
     TEXT_DATA
     Burst of bit errors due to
     stop passing traffic

#simply df1 = Standard_Data , df2 = input_Data
sentenced_data = input_data['Text_Data'].values.tolist()

df = pd.DataFrame({'sentenced_data':sentenced_data})

print(df)

df['MATCHED_KEYWORD'] = (df['sentenced_data'].apply(lambda x: [w for i in 
                                      keyword_data
                                      for w in i.split(' ')
                                      if w in (x)]))

df['KEYWORD'] = df['MATCHED_KEYWORD'].apply(','.join)

df['KEYWORD'] = df['KEYWORD'].str.replace(',',' ')

Z = Standard_Data.merge(df,on='KEYWORD',how='right')
print(Z)

我得到的结果是

KEYWORD                 TICKET              sentenced_data

Burst of bit errors       NaN        Burst of bit errors due to   
stop passing traffic      NaN        stop passing traffic   

但我想要的结果应该是这样的

KEYWORD                           sentenced_data               TICKET
Burst of bit errors               Burst of bit errors due to   89814
wires stop passing traffic        stop passing traffic         66246

请任何人帮助我解决这个问题


Tags: ofdfinputdatabitticketkeywordstandard
1条回答
网友
1楼 · 发布于 2024-06-25 23:16:30

请尝试以下代码:

df是第一个数据帧,df1是第二个数据帧

res = pd.DataFrame()
for text in df1.TEXT_DATA:
        res = res.append(
              df[df.apply(lambda row: row.KEYWORD in text or
                                    text in row.KEYWORD, axis=1)]
              )
print(res)

输出:

                      KEYWORD TICKET
0         Burst of bit errors  89814
3  wires stop passing traffic  60245


下面是另一种方法,可以使与预期输出完全相同:
res = pd.DataFrame(columns=['KEYWORD', 'TICKET', 'sentenced_data']) # create an empty dataframe to store the answer
for text in df1.TEXT_DATA: # loop-through the second dataframe
    bools = df.apply(lambda row: row.KEYWORD in text or text in row.KEYWORD, axis=1) # return a boolean series if KEYWORD contains(by the "in" keyword in python) text or text contains KEYWORD
    if (bools.any()): # filter the df by the boolean series, append it to res, append the text to second column
        res = res.append(df[bools])
        res.iloc[-bools.sum():, 2] = text
res = res[['KEYWORD', 'sentenced_data', 'TICKET']]
print(res)

输出:

                      KEYWORD              sentenced_data TICKET
0         Burst of bit errors  Burst of bit errors due to  89814
3  wires stop passing traffic        stop passing traffic  60245


^如果两个string部分匹配或100%匹配,python中的{}关键字将返回True;否则返回False

相关问题 更多 >