我有两个不同的数据帧,如图所示
df1
==================================
KEYWORD TICKET
Burst of bit errors 89814
sync and stand-by reload 66246
Port sub-modules modelling 70946
wires stop passing traffic 60245
Ignore Net flow 59052
df2
==========================
TEXT_DATA
Burst of bit errors due to
stop passing traffic
部分匹配。请帮我解决这个问题。这是我开发的一段代码
import pandas as pd
Standard_Data = pd.read_excel('bOOK2.xlsx',usecols=[0,1])
print(Standard_Data)
#Standard_Data
==================================
KEYWORD TICKET
Burst of bit errors 89814
sync and stand-by reload 66246
Port sub-modules modelling 70946
wires stop passing traffic 60245
Ignore Net flow 59052
keyword_data = Standard_Data['KEYWORD'].values.tolist()
input_data = pd.read_excel('book1.xlsx',usecols=[1])
print(input_data)
input_data
==========================
TEXT_DATA
Burst of bit errors due to
stop passing traffic
#simply df1 = Standard_Data , df2 = input_Data
sentenced_data = input_data['Text_Data'].values.tolist()
df = pd.DataFrame({'sentenced_data':sentenced_data})
print(df)
df['MATCHED_KEYWORD'] = (df['sentenced_data'].apply(lambda x: [w for i in
keyword_data
for w in i.split(' ')
if w in (x)]))
df['KEYWORD'] = df['MATCHED_KEYWORD'].apply(','.join)
df['KEYWORD'] = df['KEYWORD'].str.replace(',',' ')
Z = Standard_Data.merge(df,on='KEYWORD',how='right')
print(Z)
我得到的结果是
KEYWORD TICKET sentenced_data
Burst of bit errors NaN Burst of bit errors due to
stop passing traffic NaN stop passing traffic
但我想要的结果应该是这样的
KEYWORD sentenced_data TICKET
Burst of bit errors Burst of bit errors due to 89814
wires stop passing traffic stop passing traffic 66246
请任何人帮助我解决这个问题
请尝试以下代码:
df是第一个数据帧,df1是第二个数据帧
输出:
下面是另一种方法,可以使与预期输出完全相同:
输出:
^如果两个
string
部分匹配或100%匹配,python中的{True
;否则返回False
。相关问题 更多 >
编程相关推荐