如何使用Scrapy在没有多个页面的SPA应用程序中动态刮取加载列表

2024-09-25 00:32:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我不熟悉数据刮取,我正在做一个项目,使用Python Scrapy在电子商务网站上刮取数据

当页面加载时,列表中有12个对象,然后按“查看更多”按钮,将加载另外9个对象。我检查了AJAX请求,这是一个POST请求,很难理解

https://www.site-name.de/james/

这是POST请求,他们正在发送以下有效负载

{"content":{"searchType":"byUrl","query":"/james/?channel=company_tarife&page=2&query=T-10-13175%7C%7CT-10-13225%7C%7CT-10-13221%7C%7CT-10-13119%7C%7CT-10-04905%7C%7CT-10-11123%7C%7CT-10-11121%7C%7CT-10-11117%7C%7CT-10-11119%7C%7CT-10-11125%7C%7CT-10-11567%7C%7CT-10-11217%7C%7CT-10-11549%7C%7CT-10-11569%7C%7CT-10-11213%7C%7CT-10-12335%7C%7CT-10-12661%7C%7CT-10-12659%7C%7CT-10-12769%7C%7CT-10-12767%7C%7CT-10-12893%7C%7CT-10-12895%7C%7CT-10-13681%7C%7CT-10-13285%7C%7CT-10-13287%7C%7CT-10-13603%7C%7CT-10-13605%7C%7CT-10-13291%7C%7CT-10-13289%7C%7CT-10-13607%7C%7CT-10-13609%7C%7CT-10-13293%7C%7CT-10-13295%7C%7CT-10-13611%7C%7CT-10-13613%7C%7CT-10-13283%7C%7CT-10-13297%7C%7CT-10-13315%7C%7CT-10-13317%7C%7CT-10-10745%7C%7CT-10-10747%7C%7CT-10-13191%7C%7CT-10-13189%7C%7CT-10-13187%7C%7CT-10-12787%7C%7CT-10-12807%7C%7CT-10-12783%7C%7CT-10-12785%7C%7CT-10-09215%7C%7CT-10-09207%7C%7CT-10-09211%7C%7CT-10-09205%7C%7CT-10-09203%7C%7CT-20-12573%7C%7CT-20-12577%7C%7CT-20-12581%7C%7CT-20-12575%7C%7CT-20-12579%7C%7CT-20-12583%7C%7CT-20-13263%7C%7CT-20-13261%7C%7CT-20-13259%7C%7CT-20-11663%7C%7CT-20-11639%7C%7CT-20-11637%7C%7CT-20-11669%7C%7CT-20-11641%7C%7CT-20-11643%7C%7CT-20-11673%7C%7CT-20-11649%7C%7CT-20-11619%7C%7CT-20-11617%7C%7CT-20-11645%7C%7CT-20-11647%7C%7CT-20-11655%7C%7CT-20-11653%7C%7CT-20-11651%7C%7CT-20-11667%7C%7CT-20-11665%7C%7CT-20-10085%7C%7CT-20-10219%7C%7CT-20-10087%7C%7CT-20-11889%7C%7CT-20-11887%7C%7CT-20-10095%7C%7CT-20-10215%7C%7CT-20-10213%7C%7CT-20-12525%7C%7CT-20-12529%7C%7CT-20-12535%7C%7CT-20-12527%7C%7CT-20-12533%7C%7CT-20-12537%7C%7CT-20-09691%7C%7CT-20-10593%7C%7CT-20-12635%7C%7CT-20-13091%7C%7CT-20-12639%7C%7CT-20-13075%7C%7CT-20-13079%7C%7CT-20-09589%7C%7CT-20-12645%7C%7CT-20-09761%7C%7CT-20-05087%7C%7CT-20-09651%7C%7CT-20-09649%7C%7CT-20-09751%7C%7CT-20-09747%7C%7CT-20-13037%7C%7CT-20-13039%7C%7CT-20-13049%7C%7CT-20-13053%7C%7CT-20-12987%7C%7CT-20-13001%7C%7CT-20-10491%7C%7CT-20-10469%7C%7CT-20-12997%7C%7CT-20-12991%7C%7CT-20-10461%7C%7CT-20-10489%7C%7CT-20-10465%7C%7CT-20-10467%7C%7CT-20-12993%7C%7CT-20-12995%7C%7CT-20-12981%7C%7CT-20-12985%7C%7CT-20-12479%7C%7CT-20-12483%7C%7CT-20-12487%7C%7CT-20-12481%7C%7CT-20-12485%7C%7CT-20-12489%7C%7CT-20-13251%7C%7CT-20-13249%7C%7CT-20-13247%7C%7CT-20-05833%7C%7CT-20-05293%7C%7CT-20-05289%7C%7CT-20-07587%7C%7CT-20-05359%7C%7CT-20-05363%7C%7CT-20-07593%7C%7CT-20-07581%7C%7CT-20-07575%7C%7CT-20-05307%7C%7CT-20-05305%7C%7CT-20-10521%7C%7CT-20-10519%7C%7CT-20-07599%7C%7CT-20-10245%7C%7CT-20-10253%7C%7CT-40-13647%7C%7CT-40-12853%7C%7CT-40-12859%7C%7CT-40-13117%7C%7CT-40-13649%7C%7CT-40-13279%7C%7CT-40-12341%7C%7CT-40-12907%7C%7CT-40-12197%7C%7CT-40-12199%7C%7CT-40-10145%7C%7CT-40-13717%7C%7CT-40-13721%7C%7CT-40-10477%7C%7CT-40-10157%7C%7CT-40-10141%7C%7CT-40-10129%7C%7CT-40-10137%7C%7CT-40-10131%7C%7CT-40-10125%7C%7CT-40-10127%7C%7CT-40-10173%7C%7CT-40-12877%7C%7CT-40-12883%7C%7CT-40-10171%7C%7CT-40-10133%7C%7CT-40-10139%7C%7CT-40-10403%7C%7CT-40-12885%7C%7CT-40-12891%7C%7CT-40-10149%7C%7CT-40-10159%7C%7CT-40-12415%7C%7CT-40-12417%7C%7CT-40-10165%7C%7CT-40-10639%7C%7CT-40-06899%7C%7CT-40-06897%7C%7CT-40-06955%7C%7Cdetail-iphone12-campaign&searchField=Produktnummer_exakt","additionalParameters":{"searchField":"Produktnummer_exakt","channel":"company_tarife"}}}

我认为有效载荷是以某种方式加密或散列的。但即使我们用邮递员的方式发送请求,我们也无法得到任何回复,网站上说是404

请检查下面附带的屏幕截图

enter image description here 这是“加载更多”按钮

enter image description here 这是加载更多按钮的HTML代码。此按钮中没有任何href

我在这个项目中使用的是基于Python的Scrapy,我没有看到任何这种用例的例子。 有人能告诉我一个更好的方法吗

先谢谢你


Tags: 数据项目对象网站channelquerypost按钮