在Python bs4中为img src创建html问题的回答

在Python bs4中为img src创建html

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我正试图用Python或bs4中的BeautifulSoup解析以下HTML代码：</p> <pre><code> <div class="product w-100" data-pid="BBOMNLV1-36183" data-sid="BBOMNLWB"> <div class="product-tile w-100">  <div class="image-container"> <a href="/pd/omn1s-low/BBOMNLV1-36183.html?dwvar_BBOMNLV1-36183_style=BBOMNLWB"> <picture> <source type="image/jpeg" data-srcset="https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&amp;wid=440&amp;hei=440 1x, https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&amp;wid=880&amp;hei=880 2x" srcset="https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&amp;wid=440&amp;hei=440 1x, https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&amp;wid=880&amp;hei=880 2x"> <img class="tile-image ls-is-cached lazyloaded" src="https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&amp;wid=440&amp;hei=440" data-src="https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&amp;wid=440&amp;hei=440" data-srcset="https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2SM$&amp;wid=440&amp;hei=440 1x, https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2SM$&amp;wid=880&amp;hei=880 2x" alt="OMN1S Low" title="OMN1S Low, BBOMNLWB" itemprop="image" srcset="https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2SM$&amp;wid=440&amp;hei=440 1x, https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2SM$&amp;wid=880&amp;hei=880 2x"> </picture> </a> <div class="product-id d-none">BBOMNLV1-36183</div> <div class="wishlist-url d-none">/on/demandware.store/Sites-NBUS-Site/en_US/Wishlist-WishlistItemExists</div> <span class="wishListToggle"> <a class="wishlistTile add-to-wish-list" href="/on/demandware.store/Sites-NBUS-Site/en_US/Wishlist-AddProduct" title="Wish list"> <span class="wishlist-inactive active"> <svg role="img" class="icon svg-icon " width="24" height="24" aria-label="title"> <title> </title> <desc> </desc> <use xlink:href="#wishlist-inactive"></use> </svg></span> </a> <a class="wishlistTile remove-from-wishlist" href="/on/demandware.store/Sites-NBUS-Site/en_US/Wishlist-RemoveProduct" title="Wish list"> <span class="wishlist-active "> <svg role="img" class="icon svg-icon " width="24" height="24" aria-label="title"> <title> </title> <desc> </desc> <use xlink:href="#wishlist-active"></use> </svg></span> </a> </span> </div> <div class="tile-body"> <div class="row pgp-grid pb-2 pr-2"> <div class="col-12 col-lg-7 pl-2 fw-search"> <div class="pdp-link"> <a class="link font-weight-bold pname text-underline no-underline-lg" href="/pd/omn1s-low/BBOMNLV1-36183.html?dwvar_BBOMNLV1-36183_style=BBOMNLWB">OMN1S Low</a> <span class="category-name font-body w-100 d-block pt-2"> Men's Basketball </span> </div> </div> <div class="col-12 col-lg-5 pl-2 fw-search justify-content-lg-end text-right d-flex p-0 search-tile"> <div class="price"> <span class="price-value"> <span class="sales font-body-large "> $139.99 </span> </span> </div> </div> </div> <div class="pgp-reviews-wrapper" data-pageid="BBOMNLV1-36183" data-url="https://www.newbalance.com/on/demandware.store/Sites-NBUS-Site/en_US/ProductReviews-WriteReview?pid=BBOMNLV1-36183" id="BBOMNLV1-36183-pgp-reviews-wrapper-3"> <div class="p-w-r"> <section id="pr-category-snippets-BBOMNLV1-36183" class="pr-no-reviews" aria-labelledby="pr-UbCtutN-xQJECAE6zEJSy" data-testid="category-snippet"> <div class="pr-snippet pr-category-snippet"> <div class="pr-category-snippet__rating pr-category-snippet__item"> <div class="pr-snippet-stars pr-snippet-stars-png "> <div aria-hidden="true" class="pr-rating-stars"> <div class="pr-star-v4 pr-star-v4-0-filled"></div> <div class="pr-star-v4 pr-star-v4-0-filled"></div> <div class="pr-star-v4 pr-star-v4-0-filled"></div> <div class="pr-star-v4 pr-star-v4-0-filled"></div> <div class="pr-star-v4 pr-star-v4-0-filled"></div> </div> <div aria-hidden="true" class="pr-snippet-rating-decimal">0.0</div> </div><span id="pr-UbCtutN-xQJECAE6zEJSy" class="pr-accessible-text">Rated 0 out of 5 stars</span></div> <div class="pr-category-snippet__total pr-category-snippet__item">No Reviews</div> </div> </section> </div> </div> </div> <div class="badges"> <span class="sub-badges p-1 text-uppercase font-weight-bold">NEW</span> </div>  </div> </div> </code></pre> <p>我试图通过查找类为“tile image ls cached lazyload”的img标记来检索鞋子的pciture，然后尝试检索data src属性以获取照片的链接</p> <p>以下是我的bs4代码，它似乎不起作用：</p> <pre><code>from bs4 import BeautifulSoup def queryNewBalance(uri): r = requests.get('https://www.newbalance.com/men/shoes/basketball/?prefn1=color&prefv1=Black%7CBlue&srule=null') soup = BeautifulSoup(r.content, 'html.parser') result = soup.find_all('div', class_='product w-100') for res in result: print("*******************************") print(res.find('img', class_='tile-image ls-is-cached lazyloaded')['href]) #Picture print("*******************************") print(f"\nFound total shoes: {len(result)}") </code></pre> <p>如何修复代码以检索图像链接</p>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<p>似乎您正在获取的属性是<code>href</code>，您正在尝试刮取的<code><img></code>标记没有<code>attribute</code>，它有<code>src</code>{<cd3>}，这就是链接所在的位置。顺便说一句，将<code>html</code>参数放在您提供的长<code>html</code>代码中</p> <pre><code>def queryNewBalance(html): #r = requests.get('https://www.newbalance.com/men/shoes/basketball/?prefn1=color&prefv1=Black%7CBlue&srule=null') soup = BeautifulSoup(html, 'html.parser') result = soup.find_all('div', class_='product w-100') for res in result: print("*******************************") print(res.find('img', class_='tile-image ls-is-cached lazyloaded')['src']) #Picture print("*******************************") print(f"\nFound total shoes: {len(result)}") queryNewBalance(html) </code></pre> <p>输出</p> <pre><code>******************************* https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&wid=440&hei=440 ******************************* Found total shoes: 1 [Finished in 0.7s] </code></pre> <p>-<strong>URL</strong>-</p> <pre><code>from bs4 import BeautifulSoup import requests def queryNewBalance(): r = requests.get('https://www.newbalance.com/men/shoes/basketball/?prefn1=color&prefv1=Black%7CBlue&srule=null') soup = BeautifulSoup(r.content, 'html.parser') result = soup.find_all('div', class_='product w-100') for res in result: print("*******************************") print(res.find('img', class_='tile-image')["data-src"]) #Picture print("*******************************") print(f"\nFound total shoes: {len(result)}") queryNewBalance() </code></pre> <p>输出：</p> <pre><code>******************************* https://nb.scene7.com/is/image/NB/bbomnxbb_nb_02_i?$pdpflexf2$&wid=440&hei=440 ******************************* ******************************* https://nb.scene7.com/is/image/NB/bbomnlpl_nb_02_i?$pdpflexf2$&wid=440&hei=440 ******************************* ******************************* https://nb.scene7.com/is/image/NB/bbomnlwb_nb_02_i?$pdpflexf2$&wid=440&hei=440 ******************************* ******************************* https://nb.scene7.com/is/image/NB/bbomnlbr_nb_02_i_5a34b3da900d437a9a88?$pdpflexf2$&wid=440&hei=440 ******************************* ******************************* https://nb.scene7.com/is/image/NB/bbomnlfc_nb_02_i?$pdpflexf2$&wid=440&hei=440 ******************************* ******************************* https://nb.scene7.com/is/image/NB/bbomnlwt_nb_02_i?$pdpflexf2$&wid=440&hei=440 ******************************* Found total shoes: 6 [Finished in 2.9s] </code></pre> <p>附言：如果您更多地参与到web抓取中，并且抓取大量的网站，尤其是大型网站，我建议您将解析器更改为<code>html5lib</code>-&gt<code>pip install html5lib</code>。它是一个更好的解析器，因为我在抓取<code>html.parser</code>时遇到了问题，它只是没有以某种方式抓取网站的某些部分，尽管我检查了soup对象的位置，不管怎样，你的呼叫，祝你好运</p>

在Python bs4中为img src创建html

1 个回答

相关Python问题