无法在python3中使用bs4分析包含“.html#/something”的地址

2024-09-24 02:17:17 发布

您现在位置：Python中文网/ 问答频道 /正文

9976

网友

男 | 程序猿一只，喜欢编程写python代码。

我的目标是解析第二页的图像。我用的是bf4和Python3。请看这两页：

1）只有page有所有4种颜色的图像（我可以解析这个页面）

2）和page，其中仅包含1种颜色的图像（本例中为色度）。我需要分析这个页面

使用浏览器，我可以看到第二页不同于第一页。但是，使用bs4，我在第一页和第二页得到了类似的结果，因为python无法识别第二页地址中的“.html#/kolor chrom”

首页地址：“https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html”

第二页地址：“https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html#/kolor-chrom”

要复制的代码：

from bs4 import BeautifulSoup
import requests

adres1 = "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html"
adres2 = "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html#/kolor-chrom"

def parse_one_page(adres):
    """Parse one page and get all the img src from adres"""
    # Use headers to prevent hide our script
    headers = {'User-Agent': 'Mozilla/5.0'}
    # Get page
    page = requests.get(adres, headers=headers)  # read_timeout=5
    # Get all of the html code
    soup = BeautifulSoup(page.content, 'html.parser')
    # Find div
    divclear = soup.find_all("div", class_="clearfix")
    divclear = divclear[9]
    # Find img tag
    imgtag = [i.find_all("img") for i in divclear][0]
    # Find src
    src = [i["src"] for i in imgtag]
    # See how much images are here
    print(len(src))
    # return list with img src
    return src


print(parse_one_page(adres1))
print(parse_one_page(adres2))

运行这些代码后，您将看到这两个地址的输出是相似的：两个地址都有24个图像。第一页有24张图片（没错）。但在第二页这里必须只有2个图像，而不是24（不正确）

所以希望有人能帮助我正确地使用bs4解析python3中的第二页

Tags： https 图像 src com 地址 html page pl

1条回答

网友

1楼 · 发布于 2024-09-24 02:17:17

是的，看起来不可能用bs4解析这样的响应页面

无法在python3中使用bs4分析包含“.html#/something”的地址

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法在python3中使用bs4分析包含“.html#/something”的地址

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >