如果满足某些条件，则合并两个完整的行

Home Start Date Gross Earning Tax Gross Rental Commission Net Rental 3157 2020-03-26 00:00:00 -268.8 -28.8 -383.8 -36 -338.66 3157 2020-03-26 00:00:00 268.8 28.8 153.8 36 108.66 3157 2020-03-24 00:00:00 264.32 28.32 149.32 35.4 104.93 3157 2020-03-13 00:00:00 625.46 67.01 510.46 83.7675 405.4225 3157 2020-03-13 00:00:00 558.45 0 443.45 83.7675 342.9325 3157 2020-03-11 00:00:00 142.5 0 27.5 21.375 1.855 3157 2020-03-11 00:00:00 159.6 17.1 44.6 21.375 17.805 3157 2020-03-03 00:00:00 349.52 0 234.52 52.428 171.612 3157 2020-03-03 00:00:00 391.46 41.94 276.46 52.428 210.722

import pandas as pd from pandas import ExcelWriter from pandas import ExcelFile import numpy as np import matplotlib.pyplot as plt import os # class airbnb: #Gets the location path for the reports that come raw from the channel airbnb_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with python\Data_to_read\Bnb_feb_report.xlsx') empty_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with python\Data_to_read\empty.xlsx') #Defines the data frame df_airbnb = pd.read_excel(airbnb_excel_file) df_empty = pd.read_excel(empty_excel_file) gross_earnings = df_airbnb['Gross Earnings'] tax_amount = df_airbnb['Gross Earnings'] * 0.06 gross_rental = df_airbnb['Gross Earnings'] - df_airbnb['Cleaning Fee'] com = ((gross_rental - tax_amount) + df_airbnb['Cleaning Fee']) * 0.15 net_rental = (gross_rental - (com + df_airbnb['Host Fee'])) house = df_airbnb['Listing'] start_date = df_airbnb['Start Date'] # df = pd.DataFrame(df_empty) # df_empty.replace('nan', '') # # print(net_rental) df_report = pd.DataFrame( {'Home': house, 'Start Date': start_date, 'Gross Earning': gross_earnings, 'Tax': tax_amount, 'Gross Rental': gross_rental, 'Commission': com, 'Net Rental': net_rental}) df_report.loc[(df_report.Home == 'New house, Minutes from Disney & Attraction'), 'Home'] = '3161 Tocoa' df_report.loc[(df_report.Home == 'Brand-New House, located minutes from Disney 5151'), 'Home'] = '5151 Adelaide' df_report.loc[(df_report.Home == 'Luxury House, Located Minutes from Disney-World 57'), 'Home'] = '3157 Tocoa' df_report.loc[(df_report.Home == 'Big house, Located Minutes from Disney-World 55'), 'Home'] = '3155 Tocoa' df_report.sort_values(by=['Home'], inplace=True) # writer = ExcelWriter('Final_Report.xlsx') # df_report.to_excel(writer, 'sheet1', index=False) # writer.save() # class homeaway: homeaway_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with python\Data_to_read\PayoutSummaryReport2020-03-01_2020-03-29.xlsx') df_homeaway = pd.read_excel(homeaway_excel_file) cleaning = int(115) house = df_homeaway['Address'] start_date = df_homeaway['Check-in'] gross_earnings = df_homeaway['Gross booking amount'] taxed_amount = df_homeaway['Lodging Tax Owner Remits'] gross_rental = (gross_earnings - cleaning) com = ((gross_rental-taxed_amount) + cleaning) * 0.15 net_rental = (gross_rental - (com + df_homeaway['Deductions'])) df_report2 = pd.DataFrame( {'Home': house, 'Start Date': start_date, 'Gross Earning': gross_earnings, 'Tax': taxed_amount, 'Gross Rental': gross_rental, 'Commission': com, 'Net Rental': net_rental}) # writer = ExcelWriter('Final_Report2.xlsx') # df_report2.to_excel(writer, 'sheet1', index=False) # writer.save() df_combined = pd.concat([df_report, df_report2]) writer = ExcelWriter('Final_Report_combined.xlsx') df_report2.to_excel(writer, 'sheet1', index=False) writer.save()

1条回答

网友

1楼 · 发布于 2024-09-25 00:31:35

一种可能的方法是按主页和开始日期分组，以及然后计算所涉及行的总和：

df.groupby(['Home', 'Start Date']).sum()

幸运的是，所有“其他”列都是数字列，因此不需要列规范

但如果超过2行，且具有相同的主页和开始日期 你想：

将它们分成连续行的对
然后计算它们的总和（分别针对每一对）

您应该应用“两层”分组：

第一层-按主页和开始日期分组（如前所述）
第二层-成对分组

并计算每个第二级组的总和

在这种情况下，代码应为：

df.groupby(['Home', 'Start Date']).apply(
    lambda grp: grp.groupby(np.arange(len(grp.index)) // 2).sum())\
    .reset_index(level=-1, drop=True)

这里需要的附加操作是删除索引的最后一级（重置索引）

要测试此方法，例如，将以下行添加到数据帧：

1234 Bogus Street,2020-03-26 00:00:00,20.0,2.0,15.0,3,10.0

因此，1234 Bogus Street/2020-03-26 00:00:00组现在包含三行

运行上述代码时，您将获得：

                                       Gross Earning    Tax  Gross Rental  Commission  Net Rental
Home              Start Date                                                                     
1234 Bogus Street 2020-03-03 00:00:00         740.98  41.94        510.98     104.856     382.334
                  2020-03-11 00:00:00         302.10  17.10         72.10      42.750      19.660
                  2020-03-13 00:00:00        1183.91  67.01        953.91     167.535     748.355
                  2020-03-24 00:00:00         264.32  28.32        149.32      35.400     104.930
                  2020-03-26 00:00:00           0.00   0.00       -230.00       0.000    -230.000
                  2020-03-26 00:00:00          20.00   2.00         15.00       3.000      10.000

注意最后一行。它包括：

重复开始日期（从上一行开始）
添加行中的值

最后一行只包含前两行的和各自的主页/开始日期

相关问题更多 >

编程相关推荐

热门问题

热门文章