使用主键比较excel文件并在新数据框中显示差异

Dataset1: ID Amt Orders Name AB_1 33.4 10 TBC CD_2 56.5 20 TBC1 Dataset2: ID Amt Orders Name AB_1 50 11 TBC CD_2 60 211 TBC1 Results: ID Amt_1 Amt_2 Diff AB_1 50 33.4 16.6 CD_2 60 56.5 3.5

1条回答

网友

1楼 · 发布于 2024-09-27 07:23:28

如果格式总是一样的话，你可以自己找到不同之处。将amt_1称为数据集2中的amt有点奇怪，但我保留了您描述的amt

import pandas as pd
import numpy as np
import os

pathFolder_to_file = '*yourpath*'  

# define the types of the columns
df_dtypes = {'ID': str,'Amt': np.float,'Orders': np.float,'Name': str}

df1 = pd.read_excel(io = pathFolder_to_file + '/dataset1.xlsx', dtype = df_dtypes, index_col="ID")
df2 = pd.read_excel(io = pathFolder_to_file + '/dataset2.xlsx', dtype = df_dtypes, index_col="ID")

# inner join assumes that df1 and df2 have the same ID in them
df = df1.join(df2, rsuffix='_1', lsuffix = '_2')
df["Diff"] = df["Amt_1"] - df["Amt_2"]

# filter out all records where there is no difference and print
diff_df = df[df["Diff"] != 0]
print(diff_df[["Amt_1", "Amt_2", "Diff"]])

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用主键比较excel文件并在新数据框中显示差异

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >