编写循环:获取URL列表,只获取标题文本和元描述BeautifulSoup/Python

2024-09-30 10:32:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我是公共卫生领域的一名新数据工作者。感谢您的帮助

基本上,我们的目标是创建一种从URL列表中提取标题和元描述的快速方法。我们正在使用Python。我们不需要任何其他从网页

我有一个名为“urlList”的列表。我已经(用漂亮的汤)写下了

urlList  = https://www.freeclinics.com/cit/ca-los_angeles?sa=X&ved=2ahUKEwjew7SbgMXoAhUJZc0KHYHUB-oQ9QF6BAgIEAI,
https://www.freeclinics.com/cit/ca-los_angeles,
https://www.freeclinics.com/co/ca-los_angeles,
http://cretscmhd.psych.ucla.edu/healthfair/HF%20Services/LinkingPeopletoServices_CLinics_List_bySPA.pdf 

然后,我能够提取其中一个URL的标题和描述(参见下面的代码)。我希望能在清单上循环一下。我对任何形式的数据导出都持开放态度,即可以是数据表、.csv或.txt文件

我知道我当前的打印输出将标题和说明显示为字符串,其中说明输出位于[]。这很好。我在这篇文章中主要关注的是整个URL列表的循环

urlList = "https://www.freeclinics.com/cit/ca-los_angeles?sa=X&ved=2ahUKEwjew7SbgMXoAhUJZc0KHYHUB-oQ9QF6BAgIEAI"

response = requests.get(urlList)
soup = BeautifulSoup(response.text)
metas = soup.find_all('meta')

print((soup.title.string),[ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ])

>> Output: Free and Income Based Clinics Los Angeles CA ['Search below and find all of the free and income based health clinics in Los Angeles CA. We have listed out all of the Free Clinics listings in Los Angeles, CA below']

另外,URL列表最多只能有10-20个链接。它们在页面结构上都非常相似


Tags: andinhttpscomurl标题列表www
1条回答
网友
1楼 · 发布于 2024-09-30 10:32:46

您可以定义一个函数,该函数将urlList作为参数,并返回列表列表,其中主列表中的每个子列表都包含title及其对应的description

试试这个:

def extract_info(url_list):
    info = []
    for url in url_list:
        with requests.get(url) as response:
            soup = BeautifulSoup(response.text, "lxml")
            title = soup.find('title') .text if soup.find('title') else None
            description = soup.find('meta', {"name": "description"})["content"] if soup.find('meta', {"name": "description"}) else None
            info.append([title, description])
    return info

输出:

[['Free and Income Based Clinics Los Angeles CA',
  'Search below and find all of the free and income based health clinics in '
  'Los Angeles CA. We have listed out all of the Free Clinics listings in Los '
  'Angeles, CA below']
...
]]

相关问题 更多 >

    热门问题