网页抓取弹出窗口

2024-09-29 23:33:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新的网络抓取,我正试图自动检索包裹信息从一个城镇网站。我有300多个包裹需要这本书和页码

这是网站: https://newmilfordct.mapgeo.io/datasets/properties?abuttersDistance=100&latlng=41.587864%2C-73.425014

当你去那里时,你可以点击搜索,然后我会输入标识符(例如68/20)。我有所有这些的清单。从那里的个人资料来了,我可以得到书和页码

这就是我目前所拥有的

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = "https://newmilfordct.mapgeo.io/datasets/properties?abuttersDistance=100&latlng=41.587864%2C-  73.425014"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")

我连接到该网站,但我不知道如何与之互动。 如果有人能在正确的方向上帮助我,我们将不胜感激,并且可以节省人工工作时间


Tags: fromhttpsioimport网站htmlpropertiesurlopen
1条回答
网友
1楼 · 发布于 2024-09-29 23:33:38

您可以通过向APIurl发送POST请求来获取给定标识符的数据

以下是如何做到这一点:

import requests

search_url = "https://newmilfordct.mapgeo.io/api/datasets/properties/search?format=json"

identifier = "68/20"

payload = {
    "page": 1,
    "quickSearch": identifier
}

search_results = requests.post(search_url, payload).json()
# print(search_results)

for item in search_results:
    name = item['displayName']
    owner = item['ownerName']
    geometry = item['geometry']
    book = item['lastSaleBook']
    page = item['lastSalePage']
    print(f"Name: {name} | Owner: {owner}")
    print(f"Book/Page: {book}/{page}")
    print(geometry)
    print("-" * 80)

输出:

Name: 17 BUCKINGHAM LN | Owner: ROTELLI LOUIS
Book/Page: 0970/230
{"type":"Polygon","coordinates":[[[-73.4909038060549,41.6425898231357],[-73.4909821900848,41.6425591025291],[-73.4907493168393,41.6419510845828],[-73.4911769908149,41.6420353877],[-73.4915429751214,41.6418889484739],[-73.4915515509607,41.6418998161938],[-73.4919447199921,41.6423992451082],[-73.4920405021311,41.6425204818934],[-73.4919930203487,41.6425307775562],[-73.4919273071398,41.6425305146988],[-73.4917614178846,41.642552550643],[-73.491595684262,41.642581803258],[-73.4910018358319,41.6426901884681],[-73.4910019510053,41.6427258656192],[-73.4909038060549,41.6425898231357]]]}
                                        
Name: 15 BUCKINGHAM LN | Owner: NEELANDS DOUGLAS S + SALOME S
Book/Page: 0330/394
{"type":"Polygon","coordinates":[[[-73.4904204439222,41.6413365201908],[-73.4908759926496,41.6411167792846],[-73.4909181970441,41.6410961714263],[-73.4915429751214,41.6418889484739],[-73.4911769908149,41.6420353877],[-73.4907493168393,41.6419510845828],[-73.4909821900848,41.6425591025291],[-73.4909038060549,41.6425898231357],[-73.4904204439222,41.6413365201908]]]}
                                        

JSON中还有更多内容。只需取消注释这一行# print(search_results)即可获得整个响应

编辑:关于API的简短说明

当您将标识符放入web浏览器中开发人员工具的搜索字段时,您可以偷偷地看一看发生了什么。然后转到Network选项卡并选择XHR过滤器

选择第一项并选择Headers。在那里你可以找到Request URLRequest payload

相关问题 更多 >

    热门问题