从HTML标记Python/BeautifulSoup获取第二个元素

2024-09-27 23:22:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从页面中刮取元素,例如页面- https://www.aacr.org/?s=breast+cancer&search_type=global

标题的html标记包含一个html链接和附加的标题。当我运行代码时,它同时打印HTML(第一个位置)和标题(第二个位置/我想要的)

例如-打印返回->; <;a href=”https://www.aacr.org/patients-caregivers/cancer/breast-cancer/“title=“乳腺癌””>;乳腺癌

我只需要粗体字/第二个元素,有什么帮助吗? 这是我的密码-

import requests
import time
from bs4 import BeautifulSoup
import pandas as pd

productlinks = []
sam=[]
for x in range(1,3):
    url=f'https://www.aacr.org/page/{x}/?s=breast+cancer&search_type=global'
    r=requests.get(url)
    soup=BeautifulSoup(r.content,'html.parser')
    productlist=soup.find_all('div',class_='blog-content')
    for item in productlist:
        title=soup.find_all('h3')
        print(title)

Tags: httpsorgimport元素标题searchtitlehtml
2条回答

您必须再进行一次迭代,通过迭代每个标记来获得所需的内容(我保持了代码的完整性并添加了额外的循环,这样您就可以了解如何在一般情况下做到这一点,而不仅仅是针对这个特定的用例)

import requests
import time
from bs4 import BeautifulSoup
import pandas as pd

productlinks = []
sam=[]
for x in range(1,3):
    url=f'https://www.aacr.org/page/{x}/?s=breast+cancer&search_type=global'
    r=requests.get(url)
    soup=BeautifulSoup(r.content,'html.parser')
    productlist=soup.find_all('div',class_='blog-content')
    for item in productlist:
        title=soup.find_all('h3')
        for single in title:
            print(single.a['title'])

结果:

Breast Cancer
Male Breast Cancer
Breast Cancer Prevention (PDQ®)
Breast Cancer Screening (PDQ®)
Breast Cancer Treatment During Pregnancy (PDQ®)
Breast Cancer Treatment (PDQ®)
Male Breast Cancer Treatment (PDQ®)
Carcinoma of Unknown Primary
Overcoming Triple-Negative Breast Cancer
Living with Metastatic Breast Cancer
Surviving Metastatic Breast Cancer; Advocating for Other Cancer Patients
Living With Stage 4 Breast Cancer
Choosing to Enjoy Life Despite Metastatic Breast Cancer
A Breast and Colon Cancer Survivor Supports Cancer Research
Pedaling for Cancer Research
Emily Garnett
Supporting Increased Funding for Clinical Trials
Raising Awareness of Male Breast Cancer
Keeping Breast Cancer at Bay with Immunotherapy
Recovering after Breast Cancer Treatment Thanks to Prehab and Rehab
Takae Brewer, MD
Thankful for Clinical Trials
Bianca Lundien Kennedy
Gina Favors
Running to Beat Leukemia (and All Cancers)
Patricia Fox
Survivor Profile: An Unlikely Pivot
Program
Advances in Breast Cancer Research
Program
Breast Cancer
Male Breast Cancer
Breast Cancer Prevention (PDQ®)
Breast Cancer Screening (PDQ®)
Breast Cancer Treatment During Pregnancy (PDQ®)
Breast Cancer Treatment (PDQ®)
Male Breast Cancer Treatment (PDQ®)
Carcinoma of Unknown Primary
Overcoming Triple-Negative Breast Cancer
Living with Metastatic Breast Cancer
Surviving Metastatic Breast Cancer; Advocating for Other Cancer Patients
Living With Stage 4 Breast Cancer
Choosing to Enjoy Life Despite Metastatic Breast Cancer
A Breast and Colon Cancer Survivor Supports Cancer Research
Pedaling for Cancer Research
Emily Garnett

要获取title属性,只需将最后一个for loop更改为:

for item in productlist:
    a_tag =item.find('a')
    print(a_tag['title'])

输出:

Breast Cancer
Male Breast Cancer
Breast Cancer Prevention (PDQ®)
Breast Cancer Screening (PDQ®)
Breast Cancer Treatment During Pregnancy (PDQ®)
Breast Cancer Treatment (PDQ®)
Male Breast Cancer Treatment (PDQ®)
Carcinoma of Unknown Primary
Overcoming Triple-Negative Breast Cancer
Living with Metastatic Breast Cancer
Surviving Metastatic Breast Cancer; Advocating for Other Cancer Patients
Living With Stage 4 Breast Cancer
Choosing to Enjoy Life Despite Metastatic Breast Cancer
A Breast and Colon Cancer Survivor Supports Cancer Research
Pedaling for Cancer Research
Emily Garnett
Supporting Increased Funding for Clinical Trials
Raising Awareness of Male Breast Cancer
Keeping Breast Cancer at Bay with Immunotherapy
Recovering after Breast Cancer Treatment Thanks to Prehab and Rehab
Takae Brewer, MD
Thankful for Clinical Trials
Bianca Lundien Kennedy
Gina Favors
Running to Beat Leukemia (and All Cancers)
Patricia Fox
Survivor Profile: An Unlikely Pivot
Program
Advances in Breast Cancer Research
Program

相关问题 更多 >

    热门问题