pd.merge“TypeError:字符串索引必须是整数”

2024-09-28 18:48:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有3个文件,我的代码基本上是一系列合并,将数据从文件"lookup""NonPO"填充到文件"supplier"中,并创建一个名为"final2"的新df。代码运行得非常好,并生成我期望的输出,直到最后一次合并

当基于"supplier"(供应商编号+供应商站点代码)上名为"Unique"的新列(文件"NonPO"中具有相同名称的列)完成最后一次合并时,就会出现此问题。与此合并唯一不同的是,它基于通过连接创建的列(以前的合并使用文件中已有的列)。串联连接可能包含字母和/或数字的列,例如"260549" + "EXPENSE" = "260549EXPENSE"

我得到的错误是:

    runfile('//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/file.py', wdir='//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing')
Traceback (most recent call last):

  File "\\eu.ad.hertz.com\userdocs\irac920\Desktop\My Files\Python\Supplier cat testing\file.py", line 33, in <module>
    final2 = pd.merge(final2, NonPO[['Unique','Category']], on='Unique', how='left')

TypeError: string indices must be integers

我的文件:

  1. "supplier"-({a1})
  2. "lookup"-({a2})
  3. "NonPO"-({a3})

我们将非常感谢您对解决此问题的任何帮助。谢谢大家!

我的代码:

import pandas as pd
import numpy as np
pd.set_option('display.expand_frame_repr', False)


supplier = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Suppliers.xlsx'
lookup = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Lookup.xlsx'
NonPO = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Non-PO Suppliers.xlsx'

sr = pd.read_excel(supplier)
lp_type = pd.read_excel(lookup, sheet_name=0)
lp_paygroup = pd.read_excel(lookup, sheet_name=1)
NonPO_Suppliers = pd.read_excel(NonPO)

results_type = pd.merge(sr, lp_type[['Type','L1']], on='Type', how='left')
results_type.sort_values(by='Supplier', inplace=True)

results_paygroup = pd.merge(results_type, lp_paygroup[['Paygroup','L2']], on='Paygroup', how='left')
results_paygroup.sort_values(by='Supplier', inplace=True)

type_from_paygroup = results_paygroup.copy()
type_from_paygroup['L1'] = results_paygroup.merge(lp_paygroup, on='Paygroup', how='left').apply(lambda r: r.L1_x if (r.L1_y is np.nan or r.L2_y == 'Vendor Level') else r.L1_y, axis=1)
type_from_paygroup.sort_values(by='Supplier', inplace=True)

paygroup_from_type = type_from_paygroup.copy()
paygroup_from_type['L2'] = type_from_paygroup.merge(lp_type, on='Type', how='left').apply(lambda r: r.L2_x if (r.L2_y is np.nan or r.L2_y == 'Vendor Level') else r.L2_y, axis=1)
paygroup_from_type.sort_values(by='Supplier', inplace=True)
final = paygroup_from_type.replace(np.nan,'Missing')


final['Unique']=final['Vendor Number'].astype(str) + final['Vendor Site Code'].astype(str)
final2 = final.copy()
final2 = pd.merge(final2, NonPO[['Unique','Category']], on='Unique', how='left')
print(final2)

Tags: fromcomtyperesultsadpduniquesupplier
2条回答

您试图访问NonPO作为数据帧,但实际上这是包含该文件名的变量,该文件名是一个字符串。这里很清楚

NonPO_Suppliers = pd.read_excel(NonPO)

只要把NonPO改成NonPO_Suppliers,你就没事了

final2 = pd.merge(final2, NonPO_Suppliers[['Unique','Category']], on='Unique', how='left')

考虑如下:

NonPO = r'//eu.ad.hertz.com/userdocs/irac920/Desktop/My Files/Python/Supplier cat testing/Non-PO Suppliers.xlsx'
NonPO_Suppliers = pd.read_excel(NonPO) # this is the name of the DataFrame, not NonPO.

因此,您需要将代码更改为:

final2 = pd.merge(final2, NonPO[['Unique','Category']], on='Unique', how='left')
final2 = pd.merge(final2, NonPO_Suppliers[['Unique','Category']], on='Unique', how='left')

希望这能奏效

相关问题 更多 >