缺少季度数据时，将行添加到数据框

dict = {'Period Date': {0: Timestamp('2017-12-31 00:00:00'), 1: Timestamp('2018-03-31 00:00:00'), 2: Timestamp('2018-03-31 00:00:00'), 3: Timestamp('2018-06-30 00:00:00'), 4: Timestamp('2018-06-30 00:00:00'), 5: Timestamp('2018-09-30 00:00:00'), 6: Timestamp('2018-12-31 00:00:00'), 7: Timestamp('2019-06-30 00:00:00'), 8: Timestamp('2018-06-30 00:00:00'), 9: Timestamp('2018-06-30 00:00:00'), 10: Timestamp('2018-09-30 00:00:00'), 11: Timestamp('2019-12-31 00:00:00'), 12: Timestamp('2020-03-31 00:00:00')}, 'Investor': {0: 'Dave', 1: 'Dave', 2: 'Dave', 3: 'Dave', 4: 'Dave', 5: 'Dave', 6: 'Dave', 7: 'Dave', 8: 'Faye', 9: 'Faye', 10: 'Faye', 11: 'Mike', 12: 'Mike'}, 'Symbol': {0: 'CC', 1: 'AABA', 2: 'CONN', 3: 'AABA', 4: 'CONN', 5: 'AABA', 6: 'CC', 7: 'CC', 8: 'AABA', 9: 'CC', 10: 'CC', 11: 'AMZN', 12: 'AMZN'}, 'CUSIP': {0: 163851108, 1: 21346101, 2: 208242107, 3: 21346101, 4: 208242107, 5: 21346101, 6: 163851108, 7: 163851108, 8: 21346101, 9: 163851108, 10: 163851108, 11: 45636363, 12: 45636363}, 'Company Name': {0: 'Chemours', 1: 'Yahoo', 2: 'Conn', 3: 'Yahoo', 4: 'Conn', 5: 'Altaba', 6: 'Chemours', 7: 'Chemours', 8: 'Altaba', 9: 'Chemours', 10: 'Chemours', 11: 'Amazon', 12: 'Amazon'}, 'Shares': {0: 2, 1: 9, 2: 4, 3: 9, 4: 5, 5: 7, 6: 3, 7: 7, 8: 5, 9: 7, 10: 1, 11: 2, 12: 3}, 'Value': {0: 20, 1: 12, 2: 15, 3: 12, 4: 9, 5: 20, 6: 22, 7: 12, 8: 6, 9: 6, 10: 9, 11: 10, 12: 20}, '% Change': {0: 'New', 1: 'New', 2: 'New', 3: '0.0', 4: '0.25', 5: '-0.2222', 6: 'New', 7: 'New', 8: 'New', 9: 'New', 10: '-0.8571', 11: 'New', 12: '.5'}} df = pd.DataFrame.from_dict(dict)

Period Date Investor Symbol CUSIP Company Name Shares Value % Change 0 2018-12-31 Dave AABA 21346101 Altaba NaN NaN Sold 1 2018-03-31 Dave CC 163851108 Chemours NaN NaN Sold 2 2018-09-30 Dave CONN 208242107 Conn NaN NaN Sold 3 2018-09-30 Faye AABA 21346101 Altaba NaN NaN Sold 4 2018-12-31 Faye CC 163851108 Chemours NaN NaN Sold

# Create a dataframe of all the dates. all_dates = pd.DataFrame({"Period Date": df["Period Date"].unique()}) all_dates["one"] = 1 # Create df of all the possible cords (all combinations of id-name-date) name_ids = df[["Investor", "Symbol"]].drop_duplicates() name_ids["one"] = 1 all_possible_records = pd.merge(all_dates, name_ids, on="one") all_possible_records = pd.merge(all_possible_records, df, how ="left") all_possible_records.drop("one", axis = "columns", inplace = True) all_possible_records.sort_values(["Investor", "Symbol", "Period Date"], inplace=True) # For every record, shift 1 to see if it had any value in the previous querter all_possible_records["prev_q"] = all_possible_records.groupby(["Investor", "Symbol"]).shift(1)["Shares"] # records in which change is NaN - but there was a value in the previous querter - are 'sold' all_possible_records.loc[all_possible_records["% Change"].isna() & all_possible_records.prev_q.notna(), "% Change"]="Sold" # Drop redundant records res = all_possible_records.dropna(axis="rows", subset=["% Change"]) del res['prev_q'] res.reset_index(drop=True, inplace=True)

1条回答

网友
1楼 · 发布于 2024-09-30 20:36:16

对数据进行排序后，您可以选择每个组的最后一行，其中包含不同的“投资者”、“符号”或“CUSIP”，或者期间列上的时差大于一个季度（我使用的时间大于95天）。一旦有了这些行，就可以将0.5添加到索引中，以便以后能够concat它们并更改所需列中的值
# sort to use diff after df = df.sort_values(["Investor", "Symbol", "CUSIP", "Period Date"]).reset_index(drop=True) # select the last rows per group df_toadd = df[(df[['Investor',"Symbol", "CUSIP"]] .ne(df[['Investor',"Symbol", "CUSIP"]].shift(-1)).any(axis=1)) | df["Period Date"].diff(-1).dt.days.abs().gt(95)].copy() # add 0.5 to index and set other values df_toadd.index += 0.5 df_toadd['Period Date'] = (df_toadd['Period Date'] +pd.DateOffset(days=1)+pd.tseries.offsets.QuarterEnd()) df_toadd[['Shares', 'Value']] = np.nan df_toadd['% Change'] = 'Sold' # concat original df with the new rows and sort_index df_ = pd.concat([df, df_toadd]).sort_index().reset_index(drop=True)
你得到了什么
print (df_) Period Date Investor Symbol CUSIP Company Name Shares Value % Change 0 2018-03-31 Dave AABA 21346101 Yahoo 9.0 12.0 New 1 2018-06-30 Dave AABA 21346101 Yahoo 9.0 12.0 0.0 2 2018-09-30 Dave AABA 21346101 Altaba 7.0 20.0 -0.2222 3 2018-12-31 Dave AABA 21346101 Altaba NaN NaN Sold 4 2017-12-31 Dave CC 163851108 Chemours 2.0 20.0 New 5 2018-03-31 Dave CC 163851108 Chemours NaN NaN Sold 6 2018-12-31 Dave CC 163851108 Chemours 3.0 22.0 New 7 2019-03-31 Dave CC 163851108 Chemours NaN NaN Sold 8 2019-06-30 Dave CC 163851108 Chemours 7.0 12.0 New 9 2019-09-30 Dave CC 163851108 Chemours NaN NaN Sold 10 2018-03-31 Dave CONN 208242107 Conn 4.0 15.0 New 11 2018-06-30 Dave CONN 208242107 Conn 5.0 9.0 0.25 12 2018-09-30 Dave CONN 208242107 Conn NaN NaN Sold 13 2018-06-30 Faye AABA 21346101 Altaba 5.0 6.0 New 14 2018-09-30 Faye AABA 21346101 Altaba NaN NaN Sold 15 2018-06-30 Faye CC 163851108 Chemours 7.0 6.0 New 16 2018-09-30 Faye CC 163851108 Chemours 1.0 9.0 -0.8571 17 2018-12-31 Faye CC 163851108 Chemours NaN NaN Sold

相关问题更多 >

编程相关推荐

热门问题

热门文章