如何获得正确的会话id?(刮皮,Python)

2024-05-20 14:16:28 发布

您现在位置:Python中文网/ 问答频道 /正文

有一个url:https://maps.leicester.gov.uk/map/Aurora.svc/run?inspect_query=QPPRN&inspect_value=ROH9385&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript%24&nocache=f73eee56-45da-f708-87e7-42e82982370f&resize=always

它返回坐标。要获取坐标,它需要3个请求(我想):

  1. 上面提到的url
  2. 正在请求会话\u id
  3. 使用前面提到的会话id获取坐标

我在第二步获得会话id,但它是错误的。在第三步中,我无法使用它获取坐标。我如何知道问题在会话\u id中?当我插入从浏览器获取的会话id时,我的代码工作正常,并且收到了坐标

以下是浏览器中的请求: The 1st request

The 2nd request

The 3rd request

以下是来自浏览器的正确响应:

The correct response

这就是我的代码:

the wrong response

这是我的代码(用于Scrapy框架):

''' 导入内联请求

@inline_requests.inline_requests
def get_map_data(self, response):
    """ Getting map data. """

    map_referer = ("https://maps.leicester.gov.uk/map/Aurora.svc/run?inspect_query=QPPRN&"
        "inspect_value=ROH9385&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript"
        "%24&nocache=f73eee56-45da-f708-87e7-42e82982370f&resize=always")

    response = yield scrapy.Request(
        url=map_referer,
        meta=response.meta,
        method='GET',
        dont_filter=True,
        )

    time_str = str(int(time.time()*1000))

    headers = {
        'Referer': response.url,
        'Accept': 'application/javascript, */*; q=0.8',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
        'Host': 'maps.leicester.gov.uk',
        'Sec-Fetch-Dest': 'script',
        'Sec-Fetch-Mode': 'no-cors',
        'Sec-Fetch-Site': 'same-origin',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'
        }

    response.meta['handle_httpstatus_all'] = True

    url = ( 'https://maps.leicester.gov.uk/map/Aurora.svc/RequestSession?userName=inguest'
            '&password=&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript%24&'
            f'callback=_jqjsp&_{time_str}=' )

    reqest_session_response = yield scrapy.Request(
        url=url,
        meta=response.meta,
        method='GET',
        headers=headers,
        dont_filter=True,
        )

    session_id = re.search(r'"SessionId":"([^"]+)', reqest_session_response.text)
    session_id = session_id.group(1) if session_id else None

    print(8888888888888)
    print(session_id)

    # session_id = '954f04e2-e52c-4dd9-9046-f3f013d3f633'

    # pprn = item.get('other', {}).get('PPRN')
    pprn = 'ROH9385' # hard coded for the current page

    if session_id and pprn:
        time_str = str(int(time.time()*1000))

        url = ('https://maps.leicester.gov.uk/map/Aurora.svc/FindValue'
                f'Location?sessionId={session_id}&value={pprn}&query=QPPRN&callback=_jqjsp'
                f'&_{time_str}=')

        coords_response = yield scrapy.Request(
            url = url,
            method='GET',
            meta=reqest_session_response.meta,
            dont_filter = True,
            )

        print(coords_response.text)
        breakpoint()'''

请你把我的代码改一下,好让它得到坐标吗


Tags: httpsidurlmaptimeresponsesessionmeta
1条回答
网友
1楼 · 发布于 2024-05-20 14:16:28

网站首先创建一个sessionId,然后使用sessionId在服务器上创建一个层(我猜)。然后您可以开始请求,否则它无法在该sessionId下找到映射层

import requests

url = "https://maps.leicester.gov.uk/map/Aurora.svc/RequestSession?userName=inguest&password=&script=%5CAurora%5Cw3%5CPLANNING%5Cw3PlanApp_MG.AuroraScript%24"
res = requests.get(url, verify=False).json()
sid = res["Session"]["SessionId"]

url = f"https://maps.leicester.gov.uk/map/Aurora.svc/OpenScriptMap?sessionId={sid}"
res = requests.get(url, verify=False)

url = f"https://maps.leicester.gov.uk/map/Aurora.svc/FindValueLocation?sessionId={sid}&value=ROH9385&query=QPPRN"
res = requests.get(url, verify=False).json()
print(res)

相关问题 更多 >