{{link}}在使用Python从网页中抓取URL时返回

from bs4 import BeautifulSoup import requests url = "https://www.investing.com/search/?q=Axon&tab=news" response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}) soup = BeautifulSoup(response.content, "html.parser") for s in soup.find_all('div',{'class':'articleItem'}): for a in s.find_all('div',{'class':'textDiv'}): for b in a.find_all('a',{'class':'title'}): print(b.get('href'))

/news/stock-market-news/axovant-updates-on-parkinsons-candidate-axolentipd-1713474 /news/stock-market-news/digital-alley-up-24-on-axon-withdrawal-from-patent-challenge-1728115 /news/stock-market-news/axovant-sciences-misses-by-009-763209 /analysis/microns-mu-shares-gain-on-q3-earnings-beat-upbeat-guidance-200529289 /analysis/axon,-espr,-momo,-zyne-200182141 /analysis/factors-likely-to-impact-axon-enterprises-aaxn-q4-earnings-200391393 {{link}} {{link}}

2条回答

网友

1楼 · 编辑于 2024-05-19 13:59:45

解决此问题的一种方法是使用硒：

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

当selenium向下滚动到页面底部时，您将阅读pagesource并关闭selenium并使用Beautifulsoup解析pagesource。您还可以使用selenium进行解析

第一，硒和bs4：

from selenium import webdriver
from bs4 import BeautifulSoup

import time

PAUSE_TIME = 1
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
driver.get('https://www.investing.com/search/?q=Axon&tab=news')
lh = driver.execute_script("return document.body.scrollHeight")

while True:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


    time.sleep(PAUSE_TIME)


    nh = driver.execute_script("return document.body.scrollHeight")
    if nh == lh:
        break
    lh = nh
pagesourece = driver.page_source
driver.close()

soup = BeautifulSoup(pagesourece, "html.parser")

for s in soup.find_all('div',{'class':'articleItem'}):

    for a in s.find_all('div',{'class':'textDiv'}):
        for b in a.find_all('a',{'class':'title'}):
            print(b.get('href'))

和纯硒版本：

from selenium import webdriver
from bs4 import BeautifulSoup

import time

PAUSE_TIME = 1
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
driver.get('https://www.investing.com/search/?q=Axon&tab=news')
lh = driver.execute_script("return document.body.scrollHeight")

while True:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


    time.sleep(PAUSE_TIME)


    nh = driver.execute_script("return document.body.scrollHeight")
    if nh == lh:
        break
    lh = nh
pagesourece = driver.page_source




for s in driver.find_elements_by_css_selector('div.articleItem'):

    for a in s.find_elements_by_css_selector('div.textDiv'):
        for b in a.find_elements_by_css_selector('a.title'):
            print(b.get_attribute('href'))
driver.close()

注意：您必须安装selenium并下载geckodriver才能运行此程序。如果你想让geckodriver进入另一条路径，那么c:/program你必须改变：

driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')

去你的壁虎河小径

网友

2楼 · 编辑于 2024-05-19 13:59:45

这是因为您正在发出HTTP请求，而youtube使用JavaScript呈现视频数据。为了能够解析JS内容，您必须使用支持发出请求然后用JS呈现的库。尝试使用模块requests_html。pypi.org/project/requests-html

相关问题更多 >

编程相关推荐

热门问题

热门文章