需要关于使用Python(请求、beautifulsoup或selenium)或Javascript(NodeJS、Puppeter)通过此URL分页的指导吗?

2024-10-04 07:37:42 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我试图从以下URL中获取数据:

https://www.11880.com/suche/Immobilienmakler/deutschland?branchen=3302469%7C3302464%7C3302249%7C3303516%7C3301609%7C3300129&sorte=%7C&modul=direct

我自己很擅长抓取网页,但这个网站有一些独特的分页类型,我猜这是使用JavaScript完成的。事实上,对于前5页,它只是将page={NO}附加到URL,但在前5页之后,它将唯一标识符(查询)与页码一起附加到每个页面的URL。该查询的大部分部分对于所有页面都是相似的,只是每个页面的某些字符不同

查询如下所示:

第6页:

cmxXakxKcWNvelMwbko5aFZ3YzdWemtjb0p5MFZ3YmtBRmp2b1RTbXFSOXZuekl3cVBWNnJsV3NuSkR2QnZWMU1KSDVNUU5sQlRWMkFUU3dCSld4TEdMM0JReDRabU52WVBXc3BUeXhWd2JsWkdxOXNGanZwMkl1cHpBYkczTzBuSjlocGxWNnIzMGZWYVd1b3pFaW9JQXlNSkR2Qno1MW9Uazk=

第7页:

cmxXakxKcWNvelMwbko5aFZ3YzdWemtjb0p5MFZ3YmtBRmp2b1RTbXFSOXZuekl3cVBWNnJsV3NuSkR2QnZWMU1RcDJaVFYxWjJaakx3cDBNSkQyWndaM0FUVjNMd3R2WVBXc3BUeXhWd2JrQUdNOXNGanZwMkl1cHpBYkczTzBuSjlocGxWNnIzMGZWYVd1b3pFaW9JQXlNSkR2Qno1MW9Uazk=

Python请求

我已经通过inspect检查了代码,在下一页中找不到任何这样的键。当前页面查询位于脚本标记中

请尝试从以下代码开始。这是第七页。您可以在我从Network选项卡获取的params中看到query

import requests

headers = {
    'authority': 'www.11880.com',
    'cache-control': 'max-age=0',
    'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    'cookie': '__cfduid=d3ba308b5d5994136cfb2ffd23797ae371615711538; _gcl_au=1.1.638363378.1615711421; _ga=GA1.2.1798619679.1615711421; _gid=GA1.2.1401311919.1615711421; __gads=ID=7767acb88c5bd4d2:T=1615721793:S=ALNI_MZzcwQtHCl4hWOWahxyceRZJYrgGg; randomSeed=1615724814; referrer=none; __cf_bm=d8225bc8662e1af83b9ae8c3eebbfdb7f0613cb2-1615727481-1800-AdDP/tSscGJQRiVmW/GyJBUUNHXkWvqYbiqv47MgKrvXzBt0InecHvXrwnMtnOKbtYS/YODx2Zh1ewlOlCAgtMpvjD7Vw9FG9J+gvII/EOy2; cf_chl_2=93a30b70f062111; cf_chl_prog=a41; cf_clearance=b85ac9756885d88ad6c979309aeadb222e4f60b9-1615727532-0-250; geoIPData=eyJjb3VudHJ5X2NvZGUiOm51bGwsImNvdW50cnlfY29kZTMiOm51bGwsImNvdW50cnlfbmFtZSI6bnVsbCwicmVnaW9uIjpudWxsLCJjaXR5IjpudWxsLCJwb3N0YWxfY29kZSI6bnVsbCwibGF0aXR1ZGUiOm51bGwsImxvbmdpdHVkZSI6bnVsbCwiYXJlYV9jb2RlIjpudWxsLCJkbWFfY29kZSI6bnVsbCwibWV0cm9fY29kZSI6bnVsbCwiY29udGluZW50X2NvZGUiOm51bGwsImlwIjoiMTE5LjE2MC42Ni4xMjUsIDE3Mi42OS4xMTEuMTM4In0%3D; rlData={%22randomSeed%22:1615724814%2C%22rlUrl%22:%22/suche/Immobilienmakler/deutschland?branchen=3302469%257C3302464%257C3302249%257C3303516%257C3301609%257C3300129&sorte=%257C&modul=direct&page=5%22%2C%22adsTargeting%22:{%22ort%22:[%22deutschland%22]%2C%22suche%22:[%22Immobilienmakler%22]%2C%22url%22:[%22/suche/Immobilienmakler/deutschland%22]%2C%22branche%22:[%223302469%22%2C%223302464%22%2C%223302249%22%2C%223303516%22%2C%223301609%22%2C%223300611%22%2C%223305630%22%2C%223300129%22%2C%223305491%22%2C%223305627%22]}}',
}

params = (
    ('branchen', '3302469|3302464|3302249|3303516|3301609|3300129'),
    ('sorte', '|'),
    ('modul', 'direct'),
    ('page', '7'),
    ('query', 'cmxXakxKcWNvelMwbko5aFZ3YzdWemtjb0p5MFZ3YmtBRmp2b1RTbXFSOXZuekl3cVBWNnJsV3NuSkR2QnZWMU1RcDJaVFYxWjJaakx3cDBNSkQyWndaM0FUVjNMd3R2WVBXc3BUeXhWd2JrQUdNOXNGanZwMkl1cHpBYkczTzBuSjlocGxWNnIzMGZWYVd1b3pFaW9JQXlNSkR2Qno1MW9Uazk='),
)

response = requests.get('https://www.11880.com/suche/Immobilienmakler/deutschland', headers=headers, params=params)

Python Selenium

我尝试使用selenium,但网站上有一个captcha,当我使用它时会弹出,我不知道如何绕过captcha

JavaScript或节点

我还没有尝试使用JavaScript,但我想验证码也会出现在那里

因此,上述任何技术中的任何解决方案都将受到高度赞赏。谢谢……


Tags: comurl页面paramssecfetchjavascriptcf
1条回答
网友
1楼 · 发布于 2024-10-04 07:37:42

解决方案非常简单

只需从每页发送类似于此表单的请求。此POST请求将用户重定向到所需页面。 screencast

相关问题 更多 >