带多变量的Python scrape

2024-10-16 22:24:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从两个网站上搜集一些数据点,以获得一系列股票行情。例如:

对于阵列:AAPL、FB、AMZN

1)从https://ycharts.com/companies/AAPL/dividend_yield中摘录:“过去5年的股息收益率范围-平均”值(AAPL为变量)

2)从https://ycharts.com/companies/AAPL/pe_ratio中提取:“过去5年的市盈率范围-平均值”(AAPL为变量)

3)从https://www.finviz.com/quote.ashx?t=AAPL中提取:'Book/sh'和'LT Debt/Eq'值(AAPL为变量)

以以下格式输出到CSV:

值1、值2、值3。。。对于列标题

AAPL,T,嗯。。。对于行标题

我已经从这里开始了行动1)

import urllib2
from bs4 import BeautifulSoup
import csv
from datetime import datetime

quote_page = ['https://ycharts.com/companies/AAPL/dividend_yield', 'https://ycharts.com/companies/T/dividend_yield', 'https://ycharts.com/companies/MMM/dividend_yield']

data = []
for pg in quote_page:
 page = urllib2.urlopen(pg)

soup = BeautifulSoup(page, 'html.parser')
divyield_box = soup.find('td', attrs={'class': 'col2'})
divyield = divyield_box.text.strip()
data.append((divyield))

with open('index.csv', 'a') as csv_file:
 writer = csv.writer(csv_file)
 for divyield in data:
    writer.writerow([divyield, datetime.now()])

它可以工作,但只提取数组的最后一项

非常感谢


Tags: csvhttpsimportcomdatadatetimepagedividend
1条回答
网友
1楼 · 发布于 2024-10-16 22:24:53

试试简化的_scrapy解决方案

from simplified_scrapy.request import req
from simplified_scrapy.simplified_doc import SimplifiedDoc
quote_page = ['https://ycharts.com/companies/AAPL/dividend_yield', 'https://ycharts.com/companies/T/dividend_yield', 'https://ycharts.com/companies/MMM/dividend_yield']

data = []
for pg in quote_page:
  page = req.get(pg)
  doc = SimplifiedDoc(page)
  divyield = doc.getElement('td',attr='class',value='col2').text
  # divyield = doc.select('td.col2>text()')
  data.append((divyield))
print (data)

结果:

[u'0.95%', u'5.34%', u'3.18%']

下面是简化的{a1}的更多示例

相关问题 更多 >