分解列表的解析

2024-09-28 01:29:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从财务雅虎获取数据/字符串。但是,相关信息在明细表下“隐藏”

如您所见,我可以访问其他数据,例如总收入、收入成本。当我试图访问隐藏在明细表-流动资产、库存(位于总资产和流动资产部分)下的数据时,会出现问题

Python引发AttributeError:'NoneType'对象没有属性'find_next'错误,我不认为这是说明性的

另外,我通过注释每一行发现问题在于这些元素

import urllib.request as url
from bs4 import BeautifulSoup

company = input('enter companies abbreviation')
income_page = 'https://finance.yahoo.com/quote/' + company + '/financials/'
balance_page = 'https://finance.yahoo.com/quote/' + company + '/balance-sheet/'
set_income_page = url.urlopen(income_page).read()
set_balance_page = url.urlopen(balance_page).read()
soup_income = BeautifulSoup(set_income_page, 'html.parser')
soup_balance = BeautifulSoup(set_balance_page, 'html.parser')

revenue_element = soup_income.find('span', string='Total Revenue').find_next('span').text
cogs_element = soup_income.find('span', string='Cost of Revenue').find_next('span').text
ebit_element = soup_income.find('span', string='Operating Income').find_next('span').text
net_element = soup_income.find('span', string='Pretax Income').find_next('span').text
short_assets_element = soup_balance.find('span', string='Current Assets').find_next('span').text
inventory_element = soup_balance.find('span', string='Inventory').find_next('span').text

Tags: texturlstringpageelementfindcompanynext
1条回答
网友
1楼 · 发布于 2024-09-28 01:29:21

下面是一个使用selenium解析此网页的示例。它允许模拟用户行为:等待页面加载,关闭弹出窗口,通过单击扩展treenode并从中提取一些信息

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup

company = input('enter companies abbreviation: ')

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(' headless')
chrome_options.add_argument(' no-sandbox')
wd = webdriver.Chrome('<<PATH_TO_CHROMEDRIVER>>', options=chrome_options)

# delay (how long selenium waits for element to be loaded)
DELAY = 30

# maximize browser window
wd.maximize_window()

# load page via selenium
wd.get('https://finance.yahoo.com/quote/' + company + '/financials/')

# check for popup, close it
try:
    btn = WebDriverWait(wd, DELAY).until(EC.presence_of_element_located((By.XPATH, '//button[text()="I agree"]')))
    wd.execute_script("arguments[0].scrollIntoView();", btn)
    wd.execute_script("arguments[0].click();", btn)
except:
    pass

# wait for page to load
results = WebDriverWait(wd, DELAY).until(EC.presence_of_element_located((By.ID, 'Col1-1-Financials-Proxy')))

# parse content
soup_income = BeautifulSoup(results.get_attribute('innerHTML'), 'html.parser')

# extract values
revenue_element = soup_income.find('span', string='Total Revenue').find_next('span').text
cogs_element = soup_income.find('span', string='Cost of Revenue').find_next('span').text
ebit_element = soup_income.find('span', string='Operating Income').find_next('span').text
net_element = soup_income.find('span', string='Pretax Income').find_next('span').text

# load page via selenium
wd.get('https://finance.yahoo.com/quote/' + company + '/balance-sheet/')

# wait for page to load
results = WebDriverWait(wd, DELAY).until(EC.presence_of_element_located((By.ID, 'Col1-1-Financials-Proxy')))

# expand total assets
btn = WebDriverWait(wd, DELAY).until(EC.element_to_be_clickable((By.XPATH, '//span[text()="Total Assets"]/preceding-sibling::button')))
wd.execute_script("arguments[0].scrollIntoView();", btn)
wd.execute_script("arguments[0].click();", btn)
    
# expand inventory
btn = WebDriverWait(wd, DELAY).until(EC.element_to_be_clickable((By.XPATH, '//span[text()="Current Assets"]/preceding-sibling::button')))
wd.execute_script("arguments[0].scrollIntoView();", btn)
wd.execute_script("arguments[0].click();", btn)

# parse content
soup_balance = BeautifulSoup(results.get_attribute('innerHTML'), 'html.parser')

# extract values
short_assets_element = soup_balance.find('span', string='Current Assets').find_next('span').text
inventory_element = soup_balance.find('span', string='Inventory').find_next('span').text

# close webdriver
wd.quit()

print(revenue_element)
print(cogs_element)
print(ebit_element)
print(net_element)
print(short_assets_element)
print(inventory_element)

相关问题 更多 >

    热门问题