使用beauthoulsoup查找ID或类名中的特定单词

2024-09-24 00:25:35 发布

您现在位置：Python中文网/ 问答频道 /正文

2904

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在使用beautifulsoup从电子商务网站的产品页面中提取信息。我想要识别的产品页面是：

“CLASS或ID属性中会有‘thumb’单词”，例如：^{cl1}$

目前我的程序只在网址中寻找.html，但那只是一个电子商务网站。但我希望它搜索整个html，并查找其中包含“thumb”单词的ID和CLASS属性。在

我目前的代码如下：

        if ".html" in childurl: # store details into product_details table if its a product page
              print("Product Found.!")
              print(childurl)
              soup = BeautifulSoup(urllib2.urlopen(childurl).read())
              priceele = soup.find(itemprop='price').string.strip()
              brandname = soup.find(itemprop='brand').string.strip()
              nameele = soup.find(itemprop='name').string.strip()
              image = soup.find(itemprop='image').get('src')

损益表

Tags： id string 属性产品网站 html 页面 find

1条回答

网友

1楼 · 发布于 2024-09-24 00:25:35

尝试使用regexp模式

import bs4, re
html="""<html><body><div class="foo_thumb"></div><p class="wrong"></p><a id="barthumb"></a></body></html>"""
soup = bs4.BeautifulSoup(html)
predicates = [
    {'id' : re.compile('.*thumb.*')}, 
    {'class' : re.compile('.*thumb.*')},
]
for p in predicates:
    soup.find_all(**p)
#will print [<a id="barthumb"></a>], [<div class="foo_thumb"></div>]

使用beauthoulsoup查找ID或类名中的特定单词

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用beauthoulsoup查找ID或类名中的特定单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >