刮削<span>使用BeautifulSoup流动

2024-09-30 10:39:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用BeautifulSoup从网站上抓取数据。我似乎找不到一种方法来打印span元素之间的文本。结构如下

<span class="greyText smallText">
                avg rating 4.02 —
                132,623 ratings  —
                published 2014
              </span>
<span class="greyText smallText">
                avg rating 4.03 —
                82,319 ratings  —
                published 2015
              </span>

我需要找到在单独的平均评级和评级

import requests
from bs4 import BeautifulSoup as bs

url= "https://someurl"
page = requests.get(url) 
soup = bs(page.content, 'html.parser')
print(soup)
ratings = soup.find_all('span', attrs={'class': 'greyText smallText'})

Tags: importurlbspagerequestsclassavgspan
2条回答
In [32]: [i.text.strip() for i in soup.find_all("span",class_="greyText smallText")]
Out[32]:
['avg rating 4.02 —\n                132,623 ratings  —\n                published 2014',
 'avg rating 4.03 —\n                82,319 ratings  —\n                published 2015']

作为单独价值的评级:

In [48]: [i.text.strip().split("\n")[0] for i in soup.find_all("span",class_="greyText smallText")]
Out[48]: ['avg rating 4.02 —', 'avg rating 4.03 —']

替代解决方案:您可以使用re模块提取平均评分:

import re
from bs4 import BeautifulSoup

txt = '''<span class="greyText smallText">
                avg rating 4.02 —
                132,623 ratings  —
                published 2014
              </span>
<span class="greyText smallText">
                avg rating 4.03 —
                82,319 ratings  —
                published 2015
              </span>'''

soup = BeautifulSoup(txt, 'html.parser')

for span in soup.select('span.greyText.smallText'):
    avg_rating = re.search(r'avg rating ([\d.]+)', span.text)
    if avg_rating:
        print(avg_rating[1])

印刷品:

4.02
4.03

相关问题 更多 >

    热门问题