所以我有这个代码。我成功地提取了页面的每个产品名称
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
page_url = "https://www.tenniswarehouse-europe.com/catpage-WILSONRACS-EN.html"
uClient = uReq(page_url)
page_soup = soup(uClient.read(), "html.parser")
uClient.close()
containers = page_soup.findAll("div", {"class":"product_wrapper cf rac"})
for container in containers:
name = container.div.img["alt"]
print(name)
我试图从下面的html中提取价格。我尝试了与上面相同的方法,但遇到了一个错误,即索引超出范围。我也试着去划分价格,甚至跨度,但是没有用
<div class="product_wrapper cf rac">
<div class="image_wrap">
<a href="https://www.tenniswarehouse-europe.com/Wilson_Pro_Staff_RF_97_V130_Racket/descpageRCWILSON-97V13R-EN.html">
<img class="cell_rac_img" src="https://img.tenniswarehouse-europe.com/cache/56/97V13R-thumb.jpg" srcset="https://img.tenniswarehouse-europe.com/cache/112/97V13R-thumb.jpg 2x" alt="Wilson Pro Staff RF 97 V13.0 Racket" />
</a>
</div>
<div class="text_wrap">
<a class="name " href="https://www.tenniswarehouse-europe.com/Wilson_Pro_Staff_RF_97_V130_Racket/descpageRCWILSON-97V13R-EN.html">Wilson Pro Staff RF 97 V13.0 Racket</a>
<div class="pricing">
<span class="price"><span class="convert_price">264,89 €</span></span>
<span class="msrp">SRP <span class="convert_price">300,00 €</span></span>
</div>
<div class="pricebreaks">
<span class="pricebreak">Price for 2: <span class="convert_price">242,90 €</span> each</span>
</div>
<div>
<p>Wilson updates the cosmetic of Federer's RF97 but keeps the perfect spec profile and sublime feel that has come to define this iconic racket. Headsize: 626cm². String Pattern: 16x19. Standard Length</p>
<div class="cf">
<div class="feature_links cf">
<a class="review ga_event" href="/Reviews/97V13R/97V13Rreview.html" data-trackcategory="Product Info" data-trackaction="TWE Product Review" data-tracklabel="97V13R - Wilson Pro Staff RF 97 V13.0 Racket">TW Reviews</a>
<a class="feedback ga_event" href="/feedback.html?pcode=97V13R" data-trackcategory="Product Info" data-trackaction="TWE Customer Review" data-tracklabel="97V13R - productName">Customer Reviews</a>
<a class="video_popup ga_event" href="/productvideo.html?pcode=97V13R" data-trackcategory="Video" data-trackaction="Cat - Product Review" data-tracklabel="Wilson_Pro_Staff_RF_97_V130_Racket">Video</a>
</div>
</div>
</div>
</div>
</div>
</td>
<td class="cat_border_cell">
<div class="product_wrapper cf rac">
我想这会对你有用:
然后,页面上将有一个包含所有价格的容器,您可以使用
prices[0] ... prices[len(prices)-1]
访问单个价格。 如果要从价格表中删除html标记,请执行prices[0].text
但是这个HTML的确切来源是什么?Bc价格不在代码中添加的链接页面上。所以在这汤里你应该找不到任何价格
上面的代码适用于您在那里提供的html代码
编辑:下面评论的屏幕截图
!!解决方案!:
解决此问题的一种方法是将SeleniumWebDriver与BeautifulSoup一起使用。我似乎找不到其他(更容易的)方法了
首先,使用
pip install selenium
安装Selenium下载浏览器here的驱动程序
我们要做的是,点击打开网站时出现的“设置选项”按钮,然后在页面上添加已经加载的价格。享受我下面的代码
这是否有助于:
相关问题 更多 >
编程相关推荐