我已经能够从两个单独的xlsx中提取数据,并使用pandas将它们合并到一个xlsx表中
我知道你有一张这样的桌子
Home Start Date Gross Earning Tax Gross Rental Commission Net Rental
3157 2020-03-26 00:00:00 -268.8 -28.8 -383.8 -36 -338.66
3157 2020-03-26 00:00:00 268.8 28.8 153.8 36 108.66
3157 2020-03-24 00:00:00 264.32 28.32 149.32 35.4 104.93
3157 2020-03-13 00:00:00 625.46 67.01 510.46 83.7675 405.4225
3157 2020-03-13 00:00:00 558.45 0 443.45 83.7675 342.9325
3157 2020-03-11 00:00:00 142.5 0 27.5 21.375 1.855
3157 2020-03-11 00:00:00 159.6 17.1 44.6 21.375 17.805
3157 2020-03-03 00:00:00 349.52 0 234.52 52.428 171.612
3157 2020-03-03 00:00:00 391.46 41.94 276.46 52.428 210.722
因此,如果您查看前两行,Home列中的名称是相同的(在本例中为3157 Tocoa),但接下来几行中的名称也是相同的。但在“开始日期”列中,该列中只有前两项是相同的(在本例中为2020年3月26日12:00:00 AM),因此我需要执行以下操作
如果日期相同,家也相同,那么我需要以下所有列的总和。 (在本例中,我需要-268.8和268.8之和,以及-28.8和28.8之和,依此类推)还需要指出的是,在某些情况下,总共有两个以上的匹配开始日期
我将介绍我现在使用的代码,我想说的是我对python相当陌生,所以我确信有一种方法可以做到这一点,非常简单,但我并不熟悉。 我也是stackoverflow的新手,所以如果我遗漏了什么或添加了什么,请原谅我
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import numpy as np
import matplotlib.pyplot as plt
import os
# class airbnb:
#Gets the location path for the reports that come raw from the channel
airbnb_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with
python\Data_to_read\Bnb_feb_report.xlsx')
empty_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with
python\Data_to_read\empty.xlsx')
#Defines the data frame
df_airbnb = pd.read_excel(airbnb_excel_file)
df_empty = pd.read_excel(empty_excel_file)
gross_earnings = df_airbnb['Gross Earnings']
tax_amount = df_airbnb['Gross Earnings'] * 0.06
gross_rental = df_airbnb['Gross Earnings'] - df_airbnb['Cleaning Fee']
com = ((gross_rental - tax_amount) + df_airbnb['Cleaning Fee']) * 0.15
net_rental = (gross_rental - (com + df_airbnb['Host Fee']))
house = df_airbnb['Listing']
start_date = df_airbnb['Start Date']
# df = pd.DataFrame(df_empty)
# df_empty.replace('nan', '')
#
# print(net_rental)
df_report = pd.DataFrame(
{'Home': house, 'Start Date': start_date, 'Gross Earning': gross_earnings, 'Tax': tax_amount,
'Gross Rental': gross_rental, 'Commission': com, 'Net Rental': net_rental})
df_report.loc[(df_report.Home == 'New house, Minutes from Disney & Attraction'), 'Home'] = '3161
Tocoa'
df_report.loc[(df_report.Home == 'Brand-New House, located minutes from Disney 5151'), 'Home'] =
'5151 Adelaide'
df_report.loc[(df_report.Home == 'Luxury House, Located Minutes from Disney-World 57'), 'Home'] =
'3157 Tocoa'
df_report.loc[(df_report.Home == 'Big house, Located Minutes from Disney-World 55'), 'Home'] = '3155
Tocoa'
df_report.sort_values(by=['Home'], inplace=True)
# writer = ExcelWriter('Final_Report.xlsx')
# df_report.to_excel(writer, 'sheet1', index=False)
# writer.save()
# class homeaway:
homeaway_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with
python\Data_to_read\PayoutSummaryReport2020-03-01_2020-03-29.xlsx')
df_homeaway = pd.read_excel(homeaway_excel_file)
cleaning = int(115)
house = df_homeaway['Address']
start_date = df_homeaway['Check-in']
gross_earnings = df_homeaway['Gross booking amount']
taxed_amount = df_homeaway['Lodging Tax Owner Remits']
gross_rental = (gross_earnings - cleaning)
com = ((gross_rental-taxed_amount) + cleaning) * 0.15
net_rental = (gross_rental - (com + df_homeaway['Deductions']))
df_report2 = pd.DataFrame(
{'Home': house, 'Start Date': start_date, 'Gross Earning': gross_earnings, 'Tax': taxed_amount,
'Gross Rental': gross_rental, 'Commission': com, 'Net Rental': net_rental})
# writer = ExcelWriter('Final_Report2.xlsx')
# df_report2.to_excel(writer, 'sheet1', index=False)
# writer.save()
df_combined = pd.concat([df_report, df_report2])
writer = ExcelWriter('Final_Report_combined.xlsx')
df_report2.to_excel(writer, 'sheet1', index=False)
writer.save()
一种可能的方法是按主页和开始日期分组,以及 然后计算所涉及行的总和:
幸运的是,所有“其他”列都是数字列,因此不需要列规范
但如果超过2行,且具有相同的主页和开始日期 你想:
您应该应用“两层”分组:
并计算每个第二级组的总和
在这种情况下,代码应为:
这里需要的附加操作是删除索引的最后一级 (重置索引)
要测试此方法,例如,将以下行添加到数据帧:
因此,1234 Bogus Street/2020-03-26 00:00:00组现在包含 三行
运行上述代码时,您将获得:
注意最后一行。它包括:
最后一行只包含前两行的和 各自的主页/开始日期
相关问题 更多 >
编程相关推荐