长问题:我有两个CSV文件,一个名为SF1的文件有季度数据(一年只有4次)和datekey列,另一个名为DAILY的文件每天提供数据。这是财务数据,因此有股票代码列
我需要抓取SF1的季度数据,并将其写入每日csv文件中,在我们获得下一个季度数据的所有日期之间
例如,SF1在2010-01-01发布了季度数据,其下一份收益报告将在2010-03-04发布。然后,我需要在日期2010-01-01到2010-03-04之间,每日文件中带有ticker AAPL
的每一行具有与SF1文件中该日期的那一行相同的信息
到目前为止,我已经制作了一个python字典,它遍历SF1文件并将日期添加到一个列表中,该列表是字典中ticker键的值。我考虑过去掉前面的字符串,只引用字典中的字符串,然后搜索要写入日常文件的数据
从SF1文件传输到每日文件所需的某些列包括:
['accoci', 'assets', 'assetsavg', 'assetsc', 'assetsnc', 'assetturnover', 'bvps', 'capex', 'cashneq', 'cashnequsd', 'cor', 'consolinc', 'currentratio', 'de', 'debt', 'debtc', 'debtnc', 'debtusd', 'deferredrev', 'depamor', 'deposits', 'divyield', 'dps', 'ebit']
迄今为止的代码:
for ind, row in sf1.iterrows():
sf1_date = row['datekey']
sf1_ticker = row['ticker']
company_date.setdefault(sf1_ticker, []).append(sf1_date)
解决这个问题的最好办法是什么
SF1 csv:
ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
每日csv:
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9
代码运行后的理想csv(包括其下资产的所有编号):
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
解决方案是
merge_asof
它允许在第二个数据帧中,将日期列合并到紧跟其后或之前的日期列由于不明确,我将在这里假设
daily.date
和sf1.datekey
都是真正的日期列,这意味着它们的数据类型是datetime64[ns]
merge_asof
无法使用数据类型为object
的字符串列我还假设您不希望
sf1
数据帧中的ev-evebit-evebitda-marketcap-pb-pe和ps列,因为它们的名称与daily
中的列冲突(稍后将详细介绍):代码可以是:
您可以获得以下列列表:股票代码、日期、最新更新、ev、EVBIT、EVBITDA、marketcap、pb、pe、ps、datekey、accoci、assets、assetsavg、assetsc、assetsnc、assetturnover、bvps、capex、cashneq、cashnequsd、cor、consolinc、currentratio、de、debt、debtc、debt USD、deferredrev、depamor、存款、股息收益率、dps、ebit、ebitda、,息税折旧摊销前利润、息税折旧摊销前利润、息税折旧摊销前利润、息税折旧摊销前利润、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益、每股收益,净利润、运营成本、opinc、应付款项、支付比率、pe1、ppnenet、预除数、价格、ps1、应收账款、再收益、收入、收益、rnd、roa、roic、ros、sbcomp、sgna、股票因子、股票期权、股票互换、sps、有形资产、税务资产、税收负债、TBVP、,营运资本及其相关价值
如果要使两个数据帧中的列都存在,则必须重命名它们。下面是一个可能的代码,将
_d
添加到daily中的列名称中:列列表现在是:ticker、date、lastdupdated、ev_d、evebit_d、evebitda_d、marketcap_d、pb_d、pe_d、ps_d、datekey、acocci
相关问题 更多 >
编程相关推荐