错误:尝试使用BeautifulSoup刮取数据时无

2024-10-03 21:29:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我是网络垃圾新手。我正在尝试使用BeautifulSoup刮取标题(QCY T5无线蓝牙耳机V5.0触摸控制立体声高清,带380mAh电池),但输出中显示无。 以下是我尝试过的代码:

from bs4 import BeautifulSoup
import requests

page=requests.get('https://www.daraz.pk/products/qcy-t5-wireless-bluetooth-earphones-v50-touch-control-stereo-hd-talking-with-380mah-battery-i143388262-s1304364361.html?spm=a2a0e.searchlist.list.1.5b7c4a71Jr4QZb&search=1')
soup=BeautifulSoup(page.content,'html.parser')
print (page.status_code)

heading=soup.find(class_='pdp-mod-product-badge-title')
print(heading)

网站中的html代码:

<div class="pdp-mod-product-badge-wrapper"><span class="pdp-mod-product-badge-title" data-spm-anchor-id="a2a0e.pdp.0.i0.4f257123ixGMNY">QCY T5 Wireless Bluetooth Earphones V5.0 Touch Control Stereo HD talking with 380mAh battery</span></div>

Webiste image


Tags: 代码badgeimportmodhtmlwithpageproduct
2条回答

There is no "pdp-mod-product-badge-title" in page.content, The correct class is "breadcrumb_item_anchor_last" which you can extract it in View Source in your browser.

View Source

Code:

from bs4 import BeautifulSoup
import requests

page=requests.get('https://www.daraz.pk/products/qcy-t5-wireless-bluetooth-earphones-v50-touch-control-stereo-hd-talking-with-380mah-battery-i143388262-s1304364361.html?spm=a2a0e.searchlist.list.1.5b7c4a71Jr4QZb&search=1')
soup=BeautifulSoup(page.content,'html.parser')
print (page.status_code)

heading=soup.find(class_='breadcrumb_item_anchor_last')

print(heading.text.strip()) #Thanks to @bigbounty

您无法获取数据的原因是网站的视图源没有您提到的类

初学者所犯的一个基本错误是在页面的Inspect选项卡中查找元素,并识别用于刮取的类。永远不要这样做

为了确保所有数据的可靠性,请始终按Ctrl+U键查看页面的源代码并查找您的内容。在大多数情况下,内容是通过使用JS文件和API调用动态呈现的,这些调用可以从“网络”选项卡中找到

对于上述问题,您要查找的信息也是动态加载的,在页面的源代码中不可用

相关问题 更多 >