从HTML标记Python/BeautifulSoup获取第二个元素

import requests import time from bs4 import BeautifulSoup import pandas as pd productlinks = [] sam=[] for x in range(1,3): url=f'https://www.aacr.org/page/{x}/?s=breast+cancer&search_type=global' r=requests.get(url) soup=BeautifulSoup(r.content,'html.parser') productlist=soup.find_all('div',class_='blog-content') for item in productlist: title=soup.find_all('h3') print(title)

2条回答

网友

1楼 · 编辑于 2024-09-27 23:22:38

您必须再进行一次迭代，通过迭代每个标记来获得所需的内容（我保持了代码的完整性并添加了额外的循环，这样您就可以了解如何在一般情况下做到这一点，而不仅仅是针对这个特定的用例）

import requests
import time
from bs4 import BeautifulSoup
import pandas as pd

productlinks = []
sam=[]
for x in range(1,3):
    url=f'https://www.aacr.org/page/{x}/?s=breast+cancer&search_type=global'
    r=requests.get(url)
    soup=BeautifulSoup(r.content,'html.parser')
    productlist=soup.find_all('div',class_='blog-content')
    for item in productlist:
        title=soup.find_all('h3')
        for single in title:
            print(single.a['title'])

结果:

Breast Cancer
Male Breast Cancer
Breast Cancer Prevention (PDQ®)
Breast Cancer Screening (PDQ®)
Breast Cancer Treatment During Pregnancy (PDQ®)
Breast Cancer Treatment (PDQ®)
Male Breast Cancer Treatment (PDQ®)
Carcinoma of Unknown Primary
Overcoming Triple-Negative Breast Cancer
Living with Metastatic Breast Cancer
Surviving Metastatic Breast Cancer; Advocating for Other Cancer Patients
Living With Stage 4 Breast Cancer
Choosing to Enjoy Life Despite Metastatic Breast Cancer
A Breast and Colon Cancer Survivor Supports Cancer Research
Pedaling for Cancer Research
Emily Garnett
Supporting Increased Funding for Clinical Trials
Raising Awareness of Male Breast Cancer
Keeping Breast Cancer at Bay with Immunotherapy
Recovering after Breast Cancer Treatment Thanks to Prehab and Rehab
Takae Brewer, MD
Thankful for Clinical Trials
Bianca Lundien Kennedy
Gina Favors
Running to Beat Leukemia (and All Cancers)
Patricia Fox
Survivor Profile: An Unlikely Pivot
Program
Advances in Breast Cancer Research
Program
Breast Cancer
Male Breast Cancer
Breast Cancer Prevention (PDQ®)
Breast Cancer Screening (PDQ®)
Breast Cancer Treatment During Pregnancy (PDQ®)
Breast Cancer Treatment (PDQ®)
Male Breast Cancer Treatment (PDQ®)
Carcinoma of Unknown Primary
Overcoming Triple-Negative Breast Cancer
Living with Metastatic Breast Cancer
Surviving Metastatic Breast Cancer; Advocating for Other Cancer Patients
Living With Stage 4 Breast Cancer
Choosing to Enjoy Life Despite Metastatic Breast Cancer
A Breast and Colon Cancer Survivor Supports Cancer Research
Pedaling for Cancer Research
Emily Garnett

网友

2楼 · 编辑于 2024-09-27 23:22:38

要获取title属性，只需将最后一个for loop更改为：

for item in productlist:
    a_tag =item.find('a')
    print(a_tag['title'])

输出：

Breast Cancer
Male Breast Cancer
Breast Cancer Prevention (PDQ®)
Breast Cancer Screening (PDQ®)
Breast Cancer Treatment During Pregnancy (PDQ®)
Breast Cancer Treatment (PDQ®)
Male Breast Cancer Treatment (PDQ®)
Carcinoma of Unknown Primary
Overcoming Triple-Negative Breast Cancer
Living with Metastatic Breast Cancer
Surviving Metastatic Breast Cancer; Advocating for Other Cancer Patients
Living With Stage 4 Breast Cancer
Choosing to Enjoy Life Despite Metastatic Breast Cancer
A Breast and Colon Cancer Survivor Supports Cancer Research
Pedaling for Cancer Research
Emily Garnett
Supporting Increased Funding for Clinical Trials
Raising Awareness of Male Breast Cancer
Keeping Breast Cancer at Bay with Immunotherapy
Recovering after Breast Cancer Treatment Thanks to Prehab and Rehab
Takae Brewer, MD
Thankful for Clinical Trials
Bianca Lundien Kennedy
Gina Favors
Running to Beat Leukemia (and All Cancers)
Patricia Fox
Survivor Profile: An Unlikely Pivot
Program
Advances in Breast Cancer Research
Program

相关问题更多 >

编程相关推荐

热门问题

热门文章