<p>这个想法非常简单。页面中的所有链接仅在向下滚动时加载,因此必须使用<code>selenium</code>滚动到页面末尾。滚动到页面末尾后,必须使用<code>driver.page_source</code>获取网站的html代码,并使用<code>BeautifulSoup</code>对其进行解析,以便提取所有链接。以下是您的操作方法:</p>
<pre><code>from bs4 import BeautifulSoup
import requests
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://www.alibaba.com/consumer-electronics/action-sports-camera/p44_p201340102?spm=a2700.8293689.HomeLeftCategory.d201340102.2f9a67afhxyQdZ')
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
lastCount = lenOfPage
time.sleep(1)
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
if lastCount==lenOfPage:
match=True
time.sleep(3)
html = driver.page_source
driver.close()
soup = BeautifulSoup(html,'html5lib')
div_tags = soup.find_all('div', class_ = "grid-col-item")
links = []
for div in div_tags:
links.append(div.div.a['href'])
print(links)
</code></pre>
<p>输出:</p>
<pre><code>['//www.alibaba.com/product-detail/2020-Full-HD-4k-1080P-go_62556989288.html', '//www.alibaba.com/product-detail/Followsun-50-in-1-Accessories-for_62065838705.html', '//www.alibaba.com/product-detail/Factory-lowest-Price-720p-action-camera_60828536337.html', '//www.alibaba.com/product-detail/New-Product-2-0-Inch-Ltps_62394746927.html', '//www.alibaba.com/product-detail/Waterproof-full-hd-1080p-720p-sport_1600084796811.html', '//www.alibaba.com/product-detail/2020-Full-HD-1080P-Go-pro_62555774741.html', '//www.alibaba.com/product-detail/A7-Action-Camera-4k-HD720P-Sports_62255736516.html', '//www.alibaba.com/product-detail/Sports-Camera-4K-Action-Camera-Ultra_62504138600.html', '//www.alibaba.com/product-detail/2016-Hot-sale-Xiaomi-Yi-Action_60434045578.html' ... '//www.alibaba.com/product-detail/Promotion-item-wide-angle-action-camera_60819668707.html']
</code></pre>
<p><strong>编辑:</strong></p>
<p>以下是您要刮取的实际网站的代码:</p>
<pre><code>from bs4 import BeautifulSoup
import requests
r = requests.get('https://video.xortec.de/search?sSearch=hikvision&p=1&o=1&n=24%22').text
soup = BeautifulSoup(r,'html5lib')
a_tags = soup.find_all('a', class_ = "product title")
links = []
for a in a_tags:
links.append(a['href'])
print(links)
</code></pre>
<p>输出:</p>
<pre><code>['https://video.xortec.de/hikvision-ds-2df4220-dx-w/316l', 'https://video.xortec.de/hikvision-ds-2td2137-35/py', 'https://video.xortec.de/hikvision-ds-2td2137-25/py', 'https://video.xortec.de/hikvision-ds-2td2137-15/py', 'https://video.xortec.de/hikvision-ds-2td2137-10/py', 'https://video.xortec.de/hikvision-ds-2td2137-7/py', 'https://video.xortec.de/hikvision-ds-2td2137-4/py', 'https://video.xortec.de/hikvision-ds-2td2137-4/v1', 'https://video.xortec.de/hikvision-ds-2df8c842ixs-ael-t2', 'https://video.xortec.de/hikvision-ds-2df8a442ixs-af/sp-t2', 'https://video.xortec.de/hikvision-ds-2de5432iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5425w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5425iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5330w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5232w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5232iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5225w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5225iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de4425w-de-e', 'https://video.xortec.de/hikvision-ds-2de4225w-de-e', 'https://video.xortec.de/hikvision-ds-2de4215w-de-e']
</code></pre>