尝试使用BeautifulSoup从html页面中提取值

2024-10-02 00:28:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新来的Python和美丽的汤,但我得到页喜欢

<div class='pid-details'><p>
  <span>Drug:</span> <a href='/search.php?searchterm=amantadine&amp;referer=pillid'>Amantadine Hydrochloride</a><br />
  <span>Strength:</span> 100 mg<br/>
  <span>Pill Imprint:</span> <a href='/imprints/c-122-6021.html'>C-122</a><br /><span>Color:</span> Yellow<br /><span>Shape:</span> Capsule-shape</p>
  <a class='input-button small' href='/imprints/c-122-6021.html'>View Images &amp; Details</a>
  <a class='input-button input-button-outline-grey small' href='/imprints/c-122-6021.html?printable=1' rel='nofollow' target='_blank'><i class='icon icon-print'></i>Print</a>
</div>

我的目标是从标签中提取价值

<a href='/search.php?searchterm=amantadine&amp;referer=pillid'>Amantadine Hydrochloride</a>

所以结果应该是

"Amantadine Hydrochloride"

请引导我,让我开始爬行。提前谢谢


Tags: brdivinputsearchhtmlbuttonclassamp
1条回答
网友
1楼 · 发布于 2024-10-02 00:28:28

我想这就是你想要的。此代码返回一个包含内部标记的列表(已找到)

        page = '<div class=\'pid-details\'><p>\
                  \<span>Drug:</span> <a href=\'/search.php?searchterm=amantadine&amp;referer=pillid\'>Amantadine Hydrochloride</a><br />\
                  <span>Strength:</span> 100 mg<br/>\
                  <span>Pill Imprint:</span> <a href=\'/imprints/c-122-6021.html\'>C-122</a><br /><span>Color:</span> Yellow<br /><span>Shape:</span> Capsule-shape</p>\
                  <a class=\'input-button small\' href=\'/imprints/c-122-6021.html\'>View Images &amp; Details</a>\
                  <a class=\'input-button input-button-outline-grey small\' href=\'/imprints/c-122-6021.html?printable=1\' rel=\'nofollow\' target=\'_blank\'><i class=\'icon icon-print\'>\
                  </i>Print</a>\
                </div>'

        soup = BeautifulSoup(page,'html.parser')  

        found = []

        hrefs = soup.find_all('a')
        p = re.compile('<a href.*>(.*)</a>', re.IGNORECASE)
        for h in hrefs:
            m = re.search(p,str(h)) 
            if m:
                found.append(m.group(1))

        found

相关问题 更多 >

    热门问题