找不到使用beautiful soup和python的xml标记

print("Searching for product...") keywordLinkFound = False while keywordLinkFound is False: html = self.driver.page_source soup = BeautifulSoup(html, 'xml') try: regexp = "%s.*%s|%s.%s" % (keyword1, keyword2, keyword2, keyword1) keywordLink = soup.find('image:title', text=re.compile(regexp)) print(keywordLink) return keywordLink except AttributeError: print("Product not found on site, retrying...") time.sleep(monitorDelay) self.driver.refresh() break

1条回答

网友

1楼 · 发布于 2024-09-27 22:42:35

这将查找<image:title>中的文本：

soup.findAll('image')[0].findAll('title')[0].text

或者你也可以

^{pr2}$

通过输出：

'ADIDAS YUNG-1 "CLOUD WHITE"'

您应该使用BeautifulSoup（documentation）中的内置方法，而不是正则表达式。使用BeatifulSoup解析HTML的好处是可以利用语言的结构化形式。在

编辑

以下是完整的工作代码：

from bs4 import BeautifulSoup

html = """
<url>
<loc>
   https://packershoes.com/products/copy-of-adidas-predator-accelerator-trainer
</loc>
<lastmod>2018-11-24T08:22:42-05:00</lastmod>
<changefreq>daily</changefreq>
<image:image>
    <image:loc>
    https://cdn.shopify.com/s/files/1/0208/5268/products/adidas_Yung-1_B37616_side.jpg?v=1537395620
    </image:loc>
    <image:title>ADIDAS YUNG-1 "CLOUD WHITE"</image:title>
</image:image>
</url>
"""

soup = BeautifulSoup(html, 'xml')
soup.image.title.text

输出：

'ADIDAS YUNG-1 "CLOUD WHITE"'

相关问题更多 >

编程相关推荐

热门问题

热门文章