从网页中提取信息

from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdriver.Firefox() driver.get("http://www.kart123.com/mobiles/pr?p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&ref=68c7d088-ae7f-4310-aa4c-a7ee176d168d") elem=driver.find_element_by_xpath("//div[@class='product-unit unit-4 browse-product']") elem1=elem.find_element_by_xpath("//div[@class='pu-details lastUni']") elem2=elem1.find_element_by_xpath("//div[@class='pu-title fk-font-13']") print elem2.find_element_by_xpath(".//a[@class='fk-display-block']").text<br> driver.close() <div class=' product-unit unit-4 browse-product ' data-pid="MOBDVHC6XKKPZ3GZ" data-tracking-products=";MOBDVHC6XKKPZ3GZ;1;6999;;eVar22=Mobile" data-size="store-grid-new-4"> <div class='pu-visual-section'> <a data-tracking-id="prd_img" class='pu-image fk-product-thumb ' href="/moto-e/p/itmdvuwsybgnbtha?pid=MOBDVHC6XKKPZ3GZ&srno=b_1&ref=83c37824-b74d-4121-8be0-27731ddccde2"> <img alt="Moto E: Mobile" data-error-url="http://img1a.flixcart.com/mob/thumb/mobile.jpg" onload="img_onload(this);" onerror="img_onerror(this);" src="http://img5a.flixcart.com/image/mobile/3/g/z/motorola-xt1022-125x125-imadvvfknshcywk5.jpeg"></img> </a> </div> <div class="pu-details lastUnit"> <div class="pu-title fk-font-13"> <a class="fk-display-block" data-tracking-id="prd_title" href="/moto-e/p/itmdvuwsybgnbtha?pid=MOBDVHC6XKKPZ3GZ&srno=b_1&ref=83c37824-b74d-4121-8be0-27731ddccde2" title="*Moto E (Black)*"> Moto E (Black) </a> </div> <div class='pu-variants fk-font-11'> and <a href="/moto-e/p/itmdvuwsybgnbtha?pid=MOBDVHC6XKKPZ3GZ&srno=b_1&ref=83c37824-b74d-4121-8be0-27731ddccde2">1 more variant</a> </div> <div class="pu-extra fk-font-11"> </div> <div class="pu-rating" data-ratingfor="ITMDVUWSYBGNBTHA#MOBDVHC6XKKPZ3GZ#moto-e"> <div class='fk-stars-small' title ='4.7 stars'> <div class='rating' style='width:94%;'> </div> </div> (852 ratings)<span class="ugc-summary-icon"></span> </div> <div class="pu-price"> <div class="pu-border-top"> <div class="pu-final"> <span class="fk-font-17 fk-bold">**Rs. 6999**</span> </div> <div class="pu-emi fk-font-12">EMI from Rs. 626</div> <div class="pu-personal"> </div> <ul class="pu-offers"> </ul> </div> </div> <div class="pu-border-top"> <ul class="pu-usp"> <li><span class="text">Android v4.4 OS</span></li> <li><span class="text">4.3-inch Touchscreen</span></li> <li><span class="text">1 GB RAM</span></li> <li><span class="text">Dual SIM (GSM + GSM)</span></li> </ul> </div> <div class="pu-compare pu-border-top"> <input type="checkbox" class="compare-checkbox" data-uniqid="83c37824-b74d-4121-8be0-27731ddccde2" id="MOBDVHC6XKKPZ3GZ" display_vertical='Mobiles' vertical="mobile" vertical_url_map='/mobiles'><label for="MOBDVHC6XKKPZ3GZ" class="compare-label">Add to compare</label> </div> </div> </div> </div> <div class="gd-col gu3">

1条回答

网友

1楼 · 发布于 2024-10-02 20:40:17

有一些工具可以帮助你完成你想做的事情。在

Scrapy（http://doc.scrapy.org）是编写web爬虫程序和保持数据最新的一个很好的工具。您可以使用XPath表示法来访问数据（例如，div[@class='pu-final']/ span/text()将为您提供rs6999）。在

如果您不具备Scrapy的所有特性，也不需要性能（比如一次性导入脚本），那么还有一个非常简单的BeautifulSoup（http://www.crummy.com/software/BeautifulSoup/bs4/doc/）。在

这只是您可以使用的众多工具中的两个，但它们都有很好的文档记录。我相信这里的很多人都会向你推荐其他一些很棒的工具，根据你的需要做出选择。在

祝你好运。在

相关问题更多 >

编程相关推荐

热门问题

热门文章