如何比较两个数据帧的列以添加映射

2024-07-05 09:11:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个dataframes,如下所示,我试图在dataframe1中基于dataframe2的ITEM/CODE将value设置为CLASSIFICATION。如果DESC与ITEM/CODE有任何匹配的词,那么我必须从dataframe2获取类型

为了实现这一点,我正在拆分DESC字符串,并尝试将列表条目与dataframe2的ITEM/CODE进行比较。有什么办法吗

dataframe1
PN      DESC                                        CLASSIFICATION
C23890  Resistor 2.21K elec
C23891  Powerswitch
C23892  Resistor 7.5K
C23893  Resistor .1K
C23894  FET elec
C23895  ELE SD Card adapter
C23896  Crystal 16Mhz
C23897  Capacitor 100uF
C23898  ELECTRONICS Resistor 10K
C23899  M3x5 Socket Cap Bolt MECH
C23900  M3x6 Socket Cap Bolt Mech
C23901  Mehcanical Assemble Kapton Tape 120mm
C23902  MK7 Filament Drive Block Front
C23903  Pulley 5mm shaft

dataframe2
ITEM/CODE      TYPE
ELE         ELECTRONIC
ELECTRONICS ELECTRONIC
Capacitor   ELECTRONIC
Resistor    ELECTRONIC
Washer      MECHANICAL
MECH        MECHANICAL

这是我到目前为止编的代码


import pandas as pd

fn = 'D:\PartsExport.xlsx'
dfInput = pd.read_excel(fn, 'Sheet1')

fn_type = 'D:\TypeMaster.xlsx'
dfType = pd.read_excel(fn_type, 'Sheet1')

dfInput['DESC_SPLIT'] = dfInput["DESC"].str.split(" ", n=-1, expand = False)

Result

PN      DESC                      CLASSIFICATION
C23890  Resistor 2.21K elec         ELECTRONIC
C23891  Powerswitch                 ELECTRONIC
C23892  Resistor 7.5K               ELECTRONIC
C23893  Resistor .1K                ELECTRONIC
C23899  M3x5 Socket Cap Bolt MECH   MECHANICAL


Tags: codesocketitemdescfnpdelectroniccap
2条回答

^{}与由dataframe2创建的序列循环一起使用,flags=re.I参数用于非大小写匹配:

import re

for k, v in dataframe2.set_index('ITEM/CODE')['TYPE'].items():
    #if necessary word boundaries
    pat = r"\b{}\b".format(k)
    #if not
    #pat = k
    dataframe1.loc[dataframe1['DESC'].str.contains(pat, flags=re.I), 'CLASSIFICATION'] = v

print (dataframe1)
        PN                                   DESC CLASSIFICATION
0   C23890                    Resistor 2.21K elec     ELECTRONIC
1   C23891                            Powerswitch            NaN
2   C23892                          Resistor 7.5K     ELECTRONIC
3   C23893                           Resistor .1K     ELECTRONIC
4   C23894                               FET elec            NaN
5   C23895                    ELE SD Card adapter     ELECTRONIC
6   C23896                          Crystal 16Mhz            NaN
7   C23897                        Capacitor 100uF     ELECTRONIC
8   C23898               ELECTRONICS Resistor 10K     ELECTRONIC
9   C23899              M3x5 Socket Cap Bolt MECH     MECHANICAL
10  C23900              M3x6 Socket Cap Bolt Mech     MECHANICAL
11  C23901  Mehcanical Assemble Kapton Tape 120mm            NaN
12  C23902         MK7 Filament Drive Block Front            NaN
13  C23903                       Pulley 5mm shaft            NaN

如果只想匹配第一个单词,请使用^{},但首先将这两个值转换为小写^{}

dataframe2['ITEM/CODE'] = dataframe2['ITEM/CODE'].str.lower()
s = dataframe2.set_index('ITEM/CODE')['TYPE']

dataframe1['CLASSIFICATION'] = dataframe1['DESC'].str.split().str[0].str.lower().map(s)
print (dataframe1)
        PN                                   DESC CLASSIFICATION
0   C23890                    Resistor 2.21K elec     ELECTRONIC
1   C23891                            Powerswitch            NaN
2   C23892                          Resistor 7.5K     ELECTRONIC
3   C23893                           Resistor .1K     ELECTRONIC
4   C23894                               FET elec            NaN
5   C23895                    ELE SD Card adapter     ELECTRONIC
6   C23896                          Crystal 16Mhz            NaN
7   C23897                        Capacitor 100uF     ELECTRONIC
8   C23898               ELECTRONICS Resistor 10K     ELECTRONIC
9   C23899              M3x5 Socket Cap Bolt MECH            NaN
10  C23900              M3x6 Socket Cap Bolt Mech            NaN
11  C23901  Mehcanical Assemble Kapton Tape 120mm            NaN
12  C23902         MK7 Filament Drive Block Front            NaN
13  C23903                       Pulley 5mm shaft            NaN

不是那么花哨,但应该做这项工作:

import pandas as pd

#convert dfType dataframe to dictionary
type_dict = dfType.set_index('ITEM/CODE').T.to_dict()

#function that takes in DESC column value and outputs corresponding value from type_dict
def map_type(in_str):
  out_str = np.NaN
  for val in in_str.split():
      if val in type_dict.keys():
         out_str = type_dict[val]['TYPE']
  return out_str

#apply above function to DESC column
dfInput['CLASSIFICATION'] = dfInput['DESC'].apply(map_type)

相关问题 更多 >