<p>编辑-应@Life的要求是复杂的,编辑后添加日期标题。在</p>
<p>使用lxml试试这个:</p>
<pre><code>import requests
from lxml import html
url = 'https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL'
url2 = 'https://finance.yahoo.com/quote/AAPL/financials?p=AAPL'
page = requests.get(url)
page2 = requests.get(url2)
tree = html.fromstring(page.content)
tree2 = html.fromstring(page2.content)
total_assets = []
Total_Current_Liabilities = []
Operating_Income_or_Loss = []
heads = []
path = '//div[@class="rw-expnded"][@data-test="fin-row"][@data-reactid]'
data_path = '../../div/span/text()'
heads_path = '//div[contains(@class,"D(ib) Fw(b) Ta(end)")]/span/text()'
dats = [tree.xpath(path),tree2.xpath(path)]
for entry in dats:
heads.append(entry[0].xpath(heads_path))
for d in entry[0]:
for s in d.xpath('//div[@title]'):
if s.attrib['title'] == 'Total Assets':
total_assets.append(s.xpath(data_path))
if s.attrib['title'] == 'Total Current Liabilities':
Total_Current_Liabilities.append(s.xpath(data_path))
if s.attrib['title'] == 'Operating Income or Loss':
Operating_Income_or_Loss.append(s.xpath(data_path))
del total_assets[0]
del Total_Current_Liabilities[0]
del Operating_Income_or_Loss[0]
print('Date Total Assets Total_Current_Liabilities:')
for date,asset,current in zip(heads[0],total_assets[0],Total_Current_Liabilities[0]):
print(date, asset, current)
print('Operating Income or Loss:')
for head,income in zip(heads[1],Operating_Income_or_Loss[0]):
print(head,income)
</code></pre>
<p>输出:</p>
^{pr2}$
<p>当然,如果需要,可以很容易地将其合并到pandas数据帧中。在</p>