<p>谷歌新闻可以用<code>requests</code>和<code>beautifulsoup</code>轻松浏览。使用<code>user-agent</code>从那里提取数据就足够了</p>
<p>签出<a href="https://selectorgadget.com/" rel="nofollow noreferrer">SelectorGadget</a>Chrome扩展,通过单击要提取的元素直观地获取<code>CSS</code>选择器</p>
<p>如果您只想从谷歌新闻中提取URL,那么它就简单到:</p>
<pre class="lang-py prettyprint-override"><code>for result in soup.select('.dbsr'):
link = result.a['href']
# 10 links here..
</code></pre>
<p>代码和<a href="https://replit.com/@DimitryZub1/Scrape-Google-News-Python-BS4-Request#main.py" rel="nofollow noreferrer">example that scrape more in the online IDE</a>:</p>
<pre class="lang-py prettyprint-override"><code>from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "yahoo finance BTC",
"hl": "en",
"gl": "us",
"tbm": "nws",
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.dbsr'):
link = result.a['href']
print(link)
-
'''
https://finance.yahoo.com/news/riot-blockchain-reports-record-second-203000136.html
https://finance.yahoo.com/news/el-salvador-not-require-bitcoin-175818038.html
https://finance.yahoo.com/video/bitcoin-hovers-around-50k-paypal-155437774.html
... other links
'''
</code></pre>
<hr/>
<p>或者,您可以使用SerpApi中的<a href="https://serpapi.com/news-results" rel="nofollow noreferrer">Google News Results API</a>来实现相同的结果。这是一个免费的付费API</p>
<p>不同之处在于,您不必弄清楚如何提取元素,随着时间的推移维护解析器,绕过Google的块</p>
<p>要集成的代码:</p>
<pre class="lang-py prettyprint-override"><code>import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "coca cola",
"tbm": "nws",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for news_result in results["news_results"]:
print(f"Title: {news_result['title']}\nLink: {news_result['link']}\n")
-
'''
Title: Coca-Cola Co. stock falls Monday, underperforms market
Link: https://www.marketwatch.com/story/coca-cola-co-stock-falls-monday-underperforms-market-01629752653-994caec748bb
... more results
'''
</code></pre>
<p>顺便说一下,我写了一篇<a href="https://dev.to/dimitryzub/scrape-google-news-with-python-4o14" rel="nofollow noreferrer">blog post</a>关于如何通过视觉表现更详细地抓取谷歌新闻(<em>包括分页</em>)</p>
<blockquote>
<p>Disclaimer, I work for SerpApi.</p>
</blockquote>