编写循环：获取URL列表，只获取标题文本和元描述BeautifulSoup/Python

urlList = https://www.freeclinics.com/cit/ca-los_angeles?sa=X&ved=2ahUKEwjew7SbgMXoAhUJZc0KHYHUB-oQ9QF6BAgIEAI, https://www.freeclinics.com/cit/ca-los_angeles, https://www.freeclinics.com/co/ca-los_angeles, http://cretscmhd.psych.ucla.edu/healthfair/HF%20Services/LinkingPeopletoServices_CLinics_List_bySPA.pdf

urlList = "https://www.freeclinics.com/cit/ca-los_angeles?sa=X&ved=2ahUKEwjew7SbgMXoAhUJZc0KHYHUB-oQ9QF6BAgIEAI" response = requests.get(urlList) soup = BeautifulSoup(response.text) metas = soup.find_all('meta') print((soup.title.string),[ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ])

1条回答

网友

1楼 · 发布于 2024-09-30 10:32:46

您可以定义一个函数，该函数将urlList作为参数，并返回列表列表，其中主列表中的每个子列表都包含title及其对应的description

试试这个：

def extract_info(url_list):
    info = []
    for url in url_list:
        with requests.get(url) as response:
            soup = BeautifulSoup(response.text, "lxml")
            title = soup.find('title') .text if soup.find('title') else None
            description = soup.find('meta', {"name": "description"})["content"] if soup.find('meta', {"name": "description"}) else None
            info.append([title, description])
    return info

输出：

[['Free and Income Based Clinics Los Angeles CA',
  'Search below and find all of the free and income based health clinics in '
  'Los Angeles CA. We have listed out all of the Free Clinics listings in Los '
  'Angeles, CA below']
...
]]

相关问题更多 >

编程相关推荐

热门问题

热门文章