请求响应中没有数据

2024-09-30 14:16:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我对python、数据刮取和自动化还不熟悉。我正在努力把URL中给出的网站刮下来。当我在浏览器中打开URL链接时,所有数据都会显示出来,但是requests.get()方法的响应不会给出这些数据。你知道吗

如果有人能告诉我出了什么问题,那真的很有帮助。你知道吗

import requests
import time
from bs4 import BeautifulSoup
URL = "https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455"
html = requests.get(URL)
time.sleep(4)
pno = response.findAll('div',{"class":"left maintenanceFeeDetails"})
print(pno)

我要抓取的数据处于付款窗口状态(只需在浏览器中粘贴URL的url即可)


Tags: 数据方法fromimporturlgettime网站
2条回答

我用User Agent尝试了allow_redirects=Trueheadersparam,但仍然注意到:

URL = "https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455"
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
response = requests.get(URL, headers=headers, allow_redirects=True)

soup = BeautifulSoup(response.text)
print(response.history)
divs = soup.find_all('div', class_='left maintenanceFeeDetails')
print(divs)

它遵循重定向,但我什么也得不到

[<Response [302]>, <Response [302]>, <Response [302]>]
[]

数据似乎是动态加载的,所以我使用了Selenium

用硒我得到了结果

from __future__ import print_function

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455")

div = driver.find_element_by_css_selector('.left.maintenanceFeeDetails')
maintenance = div
print(maintenance.text)

driver.close()

结果(可从中提取数据的表的标题)

PATENT #
APPLICATION #
FILING DATE
ISSUE DATE
Payment Window Status
WINDOW
STATUS
FEES
Patent Holder Information
Customer #
Entity Status
Phone Number
Address

根据我的评论,您需要的数据是动态生成的,因此它不在您返回的源中,请求会自动为您处理get请求的重定向,因此也永远不会成为问题:

您可以通过使用相同参数对https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details的简单get请求来模拟ajax请求,从而获得所需的信息:

params = {"patentNumber": "7871455",
          "applicationNumber": "12814074"}

api = "https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details"

data = requests.get(api, params=params).json()

它以json格式提供所有信息。你知道吗

In [1]: import requests

In [2]: params = {"patentNumber": "7871455",
   ...:           "applicationNumber": "12814074"}

In [3]: api = "https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details" 
In [4]: data = requests.get(api, params=params).json()

In [5]: data["infoMessageText"]
Out[5]: [u'No maintenance fees are due at this time. 7.5 year window opens on 01/18/2018.']

In [6]: info = data["model"][0]
In [7]: info.keys()
Out[7]: 
[u'patentStatus',
 u'feeStatus',
 u'geoRegionCode',
 u'category',
 u'patentNumber',
 u'subCategory',
 u'streetLineTwo',
 u'applicationNumber',
 u'applicationStatusDate',
 u'abandonmentDate',
 u'nationalStageIndicator',
 u'window',
 u'version',
 u'postalCode',
 u'nameLineOne',
 u'issueDate',
 u'maintenanceFeePhases',
 u'streetLineOne',
 u'filingDate',
 u'countryName',
 u'phone',
 u'correspondenceAddressIndicator',
 u'entityTypeName',
 u'nameLineTwo',
 u'applicationStatus',
 u'entityTypeCd',
 u'cityName',
 u'feeCodes',
 u'patentTitle',
 u'customerNumber',
 u'windowStatus']

In [8]: info["patentStatus"]
Out[8]: u'ACTIVE'

In [9]: info["feeStatus"]
Out[9]: u'Not Due'
In [10]: info
Out[10]: 
{u'abandonmentDate': -62135578800000,
 u'applicationNumber': u'12814074',
 u'applicationStatus': 150,
 u'applicationStatusDate': 1293512400000,
 u'category': u'UTL',
 u'cityName': u'LOS ANGELES',
 u'correspondenceAddressIndicator': True,
 u'countryName': u'UNITED STATES',
 u'customerNumber': u'33417',
 u'entityTypeCd': u'S',
 u'entityTypeName': u'SMALL',
 u'feeCodes': [],
 u'feeStatus': u'Not Due',
 u'filingDate': 1276228800000,
 u'geoRegionCode': u'CA',
 u'issueDate': 1295326800000,
 u'maintenanceFeePhases': [{u'closeDate': 1421730000000,
   u'expiredDate': 1421816400000,
   u'feeStatus': u'Paid',
   u'openDate': 1390021200000,
   u'statementStatus': u'Statement',
   u'surchargeDate': 1405742400000,
   u'transactionId': u'020314INTMTFEE00001905503725',
   u'version': 0,
   u'window': u'3.5',
   u'windowStatus': u'Closed'},
  {u'closeDate': 1547787600000,
   u'expiredDate': 1547874000000,
   u'feeStatus': u'Not Due',
   u'openDate': 1516251600000,
   u'statementStatus': None,
   u'surchargeDate': 1531972800000,
   u'transactionId': None,
   u'version': 0,
   u'window': u'7.5',
   u'windowStatus': u'Not Open'},
  {u'closeDate': 1674018000000,
   u'expiredDate': 1674104400000,
   u'feeStatus': u'Not Due',
   u'openDate': 1642482000000,
   u'statementStatus': None,
   u'surchargeDate': 1658203200000,
   u'transactionId': None,
   u'version': 0,
   u'window': u'11.5',
   u'windowStatus': u'Not Open'}],
 u'nameLineOne': u'LEWIS, BRISBOIS, BISGAARD & SMITH LLP',
 u'nameLineTwo': u'JON E HOKANSON',
 u'nationalStageIndicator': u'N',
 u'patentNumber': u'7871455',
 u'patentStatus': u'ACTIVE',
 u'patentTitle': u'JET ENGINE PROTECTION SYSTEM',
 u'phone': u'2132501800',
 u'postalCode': u'90071',
 u'streetLineOne': u'633 WEST 5TH STREET',
 u'streetLineTwo': u'SUITE 4000',
 u'subCategory': None,
 u'version': 0,
 u'window': u'7.5',
 u'windowStatus': u'Not Open'}

你可以从模型里提取任何你想要的信息。你知道吗

相关问题 更多 >

    热门问题