<blockquote>
<p>Firstly I used find_all() in beautifulSoup without class_ argument. It returned me a list of some random anchor tags that are not of any use to me.</p>
</blockquote>
<p>这是正确的行为,因为您要求<code>bs4</code>获取所有<code><a></code>标记,并且它返回了找到的所有<code><a></code>标记</p>
<p>您可以更改URL以获取非JavaScript版本:</p>
<pre class="lang-none prettyprint-override"><code>from this (JS): https://duckduckgo.com/?q=bse+reliance+stock+price&t=hx&va=g&ia=web
to this (non-JS): https://html.duckduckgo.com/html/?q=bse%20reliance%20stock%20price
</code></pre>
<p>如果每次只需要先提取<strong>链接,则可以执行以下操作:</p>
<pre class="lang-py prettyprint-override"><code>>>> first_url = soup.select_one('.result__url')['href'].replace('//', '')
"duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.bseindia.com%2Fstock%2Dshare%2Dprice%2Freliance%2Dindustries%2Dltd%2Freliance%2F500325%2F&rut=b13b3c373de61ffd03dee7ad51f9fb9274dac16d098f25920d7946dbd9a73cc7"
</code></pre>
<p>代码和<a href="https://replit.com/@DimitryZub1/Scrape-DuckDuckGo-Non-JS-verison#main.py" rel="nofollow noreferrer">full example in the online IDE</a>:</p>
<pre class="lang-py prettyprint-override"><code>import requests, lxml
from bs4 import BeautifulSoup
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "bse reliance stock price",
"kl": "us-en" # language
}
html = requests.get('https://html.duckduckgo.com/html', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
first_url = soup.select_one('.result__url')['href'].replace('//', '')
print(first_url)
# duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.bseindia.com%2Fstock%2Dshare%2Dprice%2Freliance%2Dindustries%2Dltd%2Freliance%2F500325%2F&rut=b13b3c373de61ffd03dee7ad51f9fb9274dac16d098f25920d7946dbd9a73cc7
</code></pre>
<hr/>
<p>或者,您可以使用SerpApi中的<a href="https://serpapi.com/duckduckgo-organic-results" rel="nofollow noreferrer">DuckDuckGo Organic Results API</a>。这是一个免费的付费API。查看<a href="https://serpapi.com/playground?engine=duckduckgo&q=Coffee&kl=us-en" rel="nofollow noreferrer">playground</a></p>
<p>不同之处在于它刮去了DuckDuckGo的JavaScript版本,唯一需要做的就是迭代JSON字符串并提取所需内容</p>
<p>要集成的代码:</p>
<pre class="lang-py prettyprint-override"><code>from serpapi import GoogleSearch
import os
params = {
"api_key": os.getenv("API_KEY"),
"engine": "duckduckgo",
"q": "bse reliance stock price",
"kl": "us-en"
}
search = GoogleSearch(params)
results = search.get_dict()
# [0] - index of the first organic result
first_link = results['organic_results'][0]['link']
print(first_link)
# https://www.bseindia.com/stock-share-price/reliance-industries-ltd/reliance/500325/
</code></pre>
<blockquote>
<p>Disclaimer, I work for SerpApi.</p>
</blockquote>