Python检索文章是否具有auth

s = "https://www.nytimes.com/2017/08/18/us/politics/steve-bannon-trump-white-house.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=a-lede-package-region&region=top-news&WT.nav=top-news" def checkForAuthor(): r = requests.get(s) return "By" in r.text print(checkForAuthor())

2条回答

网友

1楼 · 编辑于 2024-10-03 21:27:28

要解析html并查找所需的数据，应该使用BeautifulSoup库。你知道吗

在URL的html中，有一个带有作者的meta标记：

<meta content="By MAGGIE HABERMAN, MICHAEL D. SHEAR and GLENN THRUSH" name="byl"/>

因此，要检查是否有作者，您需要通过其名称（byl）找到它：

import requests
from bs4 import BeautifulSoup

s = "https://www.nytimes.com/2017/08/18/us/politics/steve-bannon-trump-white-house.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=a-lede-package-region&region=top-news&WT.nav=top-news"

def checkForAuthor():
    soup = BeautifulSoup(requests.get(s).content, 'html.parser')
    meta = soup.find('meta', {'name': 'byl'})
    return meta is not None

实际上，您还可以通过meta["content"]获得作者名称

网友

2楼 · 编辑于 2024-10-03 21:27:28

从网页中抓取数据的一个关键部分是查看网页的HTML源以正确地获取数据。在您提供的链接中，有以下几行包含作者信息。你知道吗

<meta name="author" content="Maggie Haberman, Michael D. Shear and Glenn Thrush" />
<meta name="byl" content="By MAGGIE HABERMAN, MICHAEL D. SHEAR and GLENN THRUSH" />
<meta property="article:author" content="https://www.nytimes.com/by/maggie-haberman" />
<meta property="article:author" content="https://www.nytimes.com/by/michael-d-shear" />
<meta property="article:author" content="https://www.nytimes.com/by/glenn-thrush" />

还有其他的，但这些应该会有所帮助。要解析这些标记，可以使用beautiful-soup。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章