如何使用每个元组的第一个值作为键将六个元组列表连接到一个数据帧中？问题的回答

如何使用每个元组的第一个值作为键将六个元组列表连接到一个数据帧中？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在测试一个具有api的服务，该api可以从中提取解析的10K公司数据。对于提取的每个指标（息税前利润、现金、总资产等），我将季度日期和指标存储在一个元组中，并将每个元组存储在一个列表中。结果是6个43-80元组的列表。我想要一个带有公司股票代码、日期和指标列的数据框架。我如何将我拥有的（元组列表）转换成那个 下面的代码用于提取数据（这是示例，因此不收费）： <pre><code>import numpy as np import json import pandas as pd content = requests.get(r'https://eodhistoricaldata.com/api/fundamentals/AAPL.US?api_token=OeAFFmMliFG5orCUuwAKQ8l4WWFQ67YX') ebit_list = [] date_list = [] totalassets_list = [] cash_list = [] totalCurrentAssets_list = [] totalCurrentLiabilities_list = [] for i in content.json()['Financials']['Income_Statement']['quarterly']: try: ebit_list.append((i, float(content.json()['Financials']['Income_Statement']['quarterly'][i]['ebit']))) except: pass try: date_list.append(i) except: pass try: totalassets_list.append((i, float(content.json()['Financials']['Balance_Sheet']['quarterly'][i]['totalAssets']))) except: pass for i in content.json()['Financials']['Balance_Sheet']['quarterly']: #print(i, float(content.json()['Financials']['Balance_Sheet']['quarterly']['2019-12-28']['totalCurrentLiabilities'])) try: cash_list.append((i, float(content.json()['Financials']['Balance_Sheet']['quarterly'][i]['cash']))) except: pass try: totalCurrentAssets_list.append((i, float(content.json()['Financials']['Balance_Sheet']['quarterly'][i]['totalCurrentAssets']))) except: pass try: totalCurrentLiabilities_list.append((i, float(content.json()['Financials']['Balance_Sheet']['quarterly'][i]['totalCurrentLiabilities']))) except: pass </code></pre> 我想要一个包含所有日期的数据框（意味着如果缺少一个度量，则填写一个零）和以下列： <code>date</code>，<code>ebit</code>，<code>totalassets</code>，<code>cash</code>，<code>totalCurrentAssets</code>，<code>totalCurrentLiabilities</code> 我不知道如何提取元组和每个元组中的值

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

实际上，我们可以大大简化此代码，以获得所需的结果（并使其在将来更易于调整！） 完成的代码在这里，更详细的解释如下： <pre><code>import numpy as np import json import pandas as pd import requests content = requests.get(r'https://eodhistoricaldata.com/api/fundamentals/AAPL.US?api_token=OeAFFmMliFG5orCUuwAKQ8l4WWFQ67YX') income_data = content.json()['Financials']['Income_Statement']['quarterly'] income = pd.DataFrame.from_dict(income_data).transpose().set_index("date") income = income[['ebit']] balance_data = content.json()['Financials']['Balance_Sheet']['quarterly'] balance = pd.DataFrame.from_dict(balance_data).transpose().set_index("date") balance = balance[['totalAssets', 'cash', 'totalCurrentAssets', 'totalCurrentLiabilities']] financials = income.merge(balance, left_index = True, right_index = True).fillna(0) </code></pre> 财务数据框架如下所示（仅显示2005-2009年的数据）： <pre><code>| date | ebit | totalAssets | cash | totalCurrentAssets | totalCurrentLiabilities | |: -| :| :| -:| -:| :| | 2009-12-26 | 4.758e+09 | 5.3926e+10 | 7.609e+09 | 3.3332e+10 | 1.3097e+10 | | 2009-09-26 | 0 | 4.7501e+10 | 5.263e+09 | 3.1555e+10 | 1.1506e+10 | | 2009-06-27 | 1.732e+09 | 4.814e+10 | 5.605e+09 | 3.517e+10 | 1.6661e+10 | | 2009-03-31 | 0 | 4.3237e+10 | 4.466e+09 | 0 | 1.3751e+10 | | 2008-12-31 | 0 | 4.2787e+10 | 7.236e+09 | 0 | 1.4757e+10 | | 2008-09-30 | 0 | 3.9572e+10 | 1.1875e+10 | 0 | 1.4092e+10 | | 2008-06-30 | 0 | 3.1709e+10 | 9.373e+09 | 0 | 9.218e+09 | | 2008-03-31 | 0 | 3.0471e+10 | 9.07e+09 | 0 | 9.634e+09 | | 2007-12-31 | 0 | 3.0039e+10 | 9.162e+09 | 0 | 1.0535e+10 | | 2007-09-30 | 0 | 2.5347e+10 | 9.352e+09 | 0 | 9.299e+09 | | 2007-06-30 | 0 | 2.1647e+10 | 7.118e+09 | 0 | 6.992e+09 | | 2007-03-31 | 0 | 1.8711e+10 | 7.095e+09 | 0 | 5.485e+09 | | 2006-12-31 | 0 | 1.9461e+10 | 7.159e+09 | 0 | 7.337e+09 | | 2006-09-30 | 0 | 1.7205e+10 | 6.392e+09 | 0 | 6.471e+09 | | 2006-06-30 | 0 | 1.5114e+10 | 0 | 0 | 5.023e+09 | | 2006-03-31 | 0 | 1.3911e+10 | 0 | 0 | 4.456e+09 | | 2005-12-31 | 0 | 1.4181e+10 | 0 | 0 | 5.06e+09 | | 2005-09-30 | 0 | 1.1551e+10 | 3.491e+09 | 0 | 3.484e+09 | | 2005-06-30 | 0 | 1.0488e+10 | 0 | 0 | 3.123e+09 | | 2005-03-31 | 0 | 1.0111e+10 | 0 | 0 | 3.352e+09 | </code></pre> <hr/> <code>content.json()['Financials']['Income_Statement']['quarterly']</code>的结果是一个字典，每个键都是日期，每个值都是第二个字典，其中包含列数据 <pre><code>{'2005-03-31': {'date': '2005-03-31', 'filing_date': None, 'currency_symbol': 'USD', 'researchDevelopment': '120000000.00', ...}, '2005-06-30': {...}, ...} </code></pre> 由于是这种情况，您实际上可以使用 <code>pd.DataFrame.from_dict(income_data).transpose().set_index("date")</code> 由于JSON的结构，转置是必要的。Pandas需要一个格式为<code>{'column name': data}</code>的字典。由于键是日期，您将首先获得一个数据框，其中行标记为“totalAssets”、“cash”等，列为日期。<code>transpose()</code>命令翻转行和列，使其符合您需要的格式最后一个<code>.set_index("date")</code>命令用于使用“日期”数据而不是初始键日期，以保持一致性并命名索引。它是完全可选的 现在，这个数据框架将包含JSON文件中的每一列，但您只对其中的几列感兴趣。代码 <code>income = income[['ebit']]</code> 仅从数据中选择相关列 由于要从两个不同的源提取数据，因此确实需要创建两个不同的表。这还有一个额外的好处，那就是你可以更清楚地看到哪些栏目是从“损益表”中提取出来的，哪些栏目是从“资产负债表”中提取出来的 最后一行 <code>financials = income.merge(balance, left_index = True, right_index = True).fillna(0)</code> 使用索引（在本例中为“日期”列）将两个表合并在一起<code>fillna(0)</code>确保按照您的请求，用零值替换任何缺失的数据 如果您最终需要添加另一个表，例如“现金流”，您可以使用相同的代码行创建该表并选择相关列，然后添加第二个合并行： <pre><code>cashflow_data = content.json()['Financials']['Balance_Sheet']['quarterly'] cashflow = pd.DataFrame.from_dict(cashflow_data).transpose().set_index("date") cashflow = cashflow[['accountsPayable', 'liabilitiesAndStockholdersEquity']] ... financials.merge(cashflow, left_index = True, right_index = True).fillna(0) </code></pre> <hr/> 作为一个额外提示，源JSON中有相当多的数据！要查看任何给定表中的可用列，请使用以下命令： <code>cashflow.columns.sort_values()</code> 要获取按字母顺序排列的列列表，可以使用： <pre><code> ['accountsPayable', 'accumulatedAmortization', 'accumulatedDepreciation', 'accumulatedOtherComprehensiveIncome', 'additionalPaidInCapital', 'capitalLeaseObligations', 'capitalSurpluse', 'cash', 'cashAndShortTermInvestments', 'commonStock', 'commonStockSharesOutstanding', 'commonStockTotalEquity', 'currency_symbol', 'deferredLongTermAssetCharges', 'deferredLongTermLiab', 'filing_date', 'goodWill', 'intangibleAssets', 'inventory', 'liabilitiesAndStockholdersEquity', 'longTermDebt', 'longTermDebtTotal', 'longTermInvestments', 'negativeGoodwill', 'netReceivables', 'netTangibleAssets', 'nonCurrentAssetsTotal', 'nonCurrentLiabilitiesOther', 'nonCurrentLiabilitiesTotal', 'nonCurrrentAssetsOther', 'noncontrollingInterestInConsolidatedEntity', 'otherAssets', 'otherCurrentAssets', 'otherCurrentLiab', 'otherLiab', 'otherStockholderEquity', 'preferredStockRedeemable', 'preferredStockTotalEquity', 'propertyPlantAndEquipmentGross', 'propertyPlantEquipment', 'retainedEarnings', 'retainedEarningsTotalEquity', 'shortLongTermDebt', 'shortTermDebt', 'shortTermInvestments', 'temporaryEquityRedeemableNoncontrollingInterests', 'totalAssets', 'totalCurrentAssets', 'totalCurrentLiabilities', 'totalLiab', 'totalPermanentEquity', 'totalStockholderEquity', 'treasuryStock', 'warrants'] </code></pre> 当数据中出现拼写错误时，如上面的“capitalSurpluse”中，这也非常有用

如何使用每个元组的第一个值作为键将六个元组列表连接到一个数据帧中？

1 个回答

相关Python问题