<p>我用User Agent尝试了<code>allow_redirects=True</code>和<code>headers</code>param,但仍然注意到:</p>
<pre><code>URL = "https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455"
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
response = requests.get(URL, headers=headers, allow_redirects=True)
soup = BeautifulSoup(response.text)
print(response.history)
divs = soup.find_all('div', class_='left maintenanceFeeDetails')
print(divs)
</code></pre>
<p>它遵循重定向,但我什么也得不到</p>
<pre><code>[<Response [302]>, <Response [302]>, <Response [302]>]
[]
</code></pre>
<h2>数据似乎是动态加载的,所以我使用了Selenium</h2>
<p>用硒我得到了结果</p>
<pre><code>from __future__ import print_function
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455")
div = driver.find_element_by_css_selector('.left.maintenanceFeeDetails')
maintenance = div
print(maintenance.text)
driver.close()
</code></pre>
<p>结果(可从中提取数据的表的标题)</p>
<pre><code>PATENT #
APPLICATION #
FILING DATE
ISSUE DATE
Payment Window Status
WINDOW
STATUS
FEES
Patent Holder Information
Customer #
Entity Status
Phone Number
Address
</code></pre>