我有一个季度股票投资组合的数据框架:
dict = {'Period Date': {0: Timestamp('2017-12-31 00:00:00'), 1: Timestamp('2018-03-31 00:00:00'), 2: Timestamp('2018-03-31 00:00:00'), 3: Timestamp('2018-06-30 00:00:00'), 4: Timestamp('2018-06-30 00:00:00'), 5: Timestamp('2018-09-30 00:00:00'), 6: Timestamp('2018-12-31 00:00:00'), 7: Timestamp('2019-06-30 00:00:00'), 8: Timestamp('2018-06-30 00:00:00'), 9: Timestamp('2018-06-30 00:00:00'), 10: Timestamp('2018-09-30 00:00:00'), 11: Timestamp('2019-12-31 00:00:00'), 12: Timestamp('2020-03-31 00:00:00')}, 'Investor': {0: 'Dave', 1: 'Dave', 2: 'Dave', 3: 'Dave', 4: 'Dave', 5: 'Dave', 6: 'Dave', 7: 'Dave', 8: 'Faye', 9: 'Faye', 10: 'Faye', 11: 'Mike', 12: 'Mike'}, 'Symbol': {0: 'CC', 1: 'AABA', 2: 'CONN', 3: 'AABA', 4: 'CONN', 5: 'AABA', 6: 'CC', 7: 'CC', 8: 'AABA', 9: 'CC', 10: 'CC', 11: 'AMZN', 12: 'AMZN'}, 'CUSIP': {0: 163851108, 1: 21346101, 2: 208242107, 3: 21346101, 4: 208242107, 5: 21346101, 6: 163851108, 7: 163851108, 8: 21346101, 9: 163851108, 10: 163851108, 11: 45636363, 12: 45636363}, 'Company Name': {0: 'Chemours', 1: 'Yahoo', 2: 'Conn', 3: 'Yahoo', 4: 'Conn', 5: 'Altaba', 6: 'Chemours', 7: 'Chemours', 8: 'Altaba', 9: 'Chemours', 10: 'Chemours', 11: 'Amazon', 12: 'Amazon'}, 'Shares': {0: 2, 1: 9, 2: 4, 3: 9, 4: 5, 5: 7, 6: 3, 7: 7, 8: 5, 9: 7, 10: 1, 11: 2, 12: 3}, 'Value': {0: 20, 1: 12, 2: 15, 3: 12, 4: 9, 5: 20, 6: 22, 7: 12, 8: 6, 9: 6, 10: 9, 11: 10, 12: 20}, '% Change': {0: 'New', 1: 'New', 2: 'New', 3: '0.0', 4: '0.25', 5: '-0.2222', 6: 'New', 7: 'New', 8: 'New', 9: 'New', 10: '-0.8571', 11: 'New', 12: '.5'}}
df = pd.DataFrame.from_dict(dict)
我想添加一行来指示股票何时售出
例如,如果股票(CUSIP
和Symbol
)出现在特定的Investor
&Period Date
组,但是不会出现在下一个Period Date
中。对于该Investor
,我想在原始数据框中添加一行,显示股票已售出
因此,要添加到原始数据帧的新行将如下所示:
Period Date Investor Symbol CUSIP Company Name Shares Value % Change
0 2018-12-31 Dave AABA 21346101 Altaba NaN NaN Sold
1 2018-03-31 Dave CC 163851108 Chemours NaN NaN Sold
2 2018-09-30 Dave CONN 208242107 Conn NaN NaN Sold
3 2018-09-30 Faye AABA 21346101 Altaba NaN NaN Sold
4 2018-12-31 Faye CC 163851108 Chemours NaN NaN Sold
我尝试了下面的方法,但结果是不完整的解决方案(缺少CUSIP
和Company Name
),我还觉得有一个更简单的解决方案
# Create a dataframe of all the dates.
all_dates = pd.DataFrame({"Period Date": df["Period Date"].unique()})
all_dates["one"] = 1
# Create df of all the possible cords (all combinations of id-name-date)
name_ids = df[["Investor", "Symbol"]].drop_duplicates()
name_ids["one"] = 1
all_possible_records = pd.merge(all_dates, name_ids, on="one")
all_possible_records = pd.merge(all_possible_records, df, how ="left")
all_possible_records.drop("one", axis = "columns", inplace = True)
all_possible_records.sort_values(["Investor", "Symbol", "Period Date"], inplace=True)
# For every record, shift 1 to see if it had any value in the previous querter
all_possible_records["prev_q"] = all_possible_records.groupby(["Investor", "Symbol"]).shift(1)["Shares"]
# records in which change is NaN - but there was a value in the previous querter - are 'sold'
all_possible_records.loc[all_possible_records["% Change"].isna() & all_possible_records.prev_q.notna(), "% Change"]="Sold"
# Drop redundant records
res = all_possible_records.dropna(axis="rows", subset=["% Change"])
del res['prev_q']
res.reset_index(drop=True, inplace=True)
那么我如何才能做到这一点呢
对数据进行排序后,您可以选择每个组的最后一行,其中包含不同的“投资者”、“符号”或“CUSIP”,或者期间列上的时差大于一个季度(我使用的时间大于95天)。一旦有了这些行,就可以将0.5添加到索引中,以便以后能够
concat
它们并更改所需列中的值你得到了什么
相关问题 更多 >
编程相关推荐