<p>使用<a href="https://api.stackexchange.com/">Stack Exchange API</a>可能比浏览站点更合适,但无论如何。。在</p>
<p>有几个问题:</p>
<ol>
<li><p>你有时会遇到验证码挑战。</p></li>
<li><p>保留默认的<code>requests</code>标题会增加获得验证码的几率,因此请使用传统浏览器中的验证码覆盖它。</p></li>
<li><p>您需要使用<code>requests.Session()</code>来维护前两个请求的cookies。</p></li>
<li><p>在添加来自<code>requests</code>会话的cookies之前,您需要使用webdriver发出初始请求并清除所有创建的cookie。</p></li>
</ol>
<p>考虑到这些因素,我可以让它与以下方面一起工作:</p>
<pre><code>import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36"
)
}
s = requests.Session()
req = s.get(url, headers=headers)
payload = {
"fkey": BeautifulSoup(req.text, "lxml").select_one("[name='fkey']")["value"],
"email": "YOUR_EMAIL",
"password": "YOUR_PASSWORD",
}
res = s.post(url, headers=headers, data=payload)
if "captcha" in res.url:
raise ValueError("Encountered captcha")
driver = webdriver.Chrome()
try:
driver.get(res.url)
driver.delete_all_cookies()
for cookie in s.cookies.items():
driver.add_cookie({"name": cookie[0], "value": cookie[1]})
driver.get(res.url)
item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']")
print(item.get_attribute("title"))
finally:
driver.quit()
</code></pre>