用scrapy刮掉一个api结果页

https://www.tripadvisor.com/TypeAheadJson?action=API&types=geo%2Cnbrhd%2Chotel%2Ctheme_park&legacy_format=true&urlList=true&strictParent=true&query=sadaf%20dubai%20hotel&max=6&name_depth=3&interleaved=true&scoreThreshold=0.5&strictAnd=false&typeahead1_5=true&disableMaxGroupSize=true&geoBoostFix=true&neighborhood_geos=true&details=true&link_type=hotel%2Cvr%2Ceat%2Cattr&rescue=true&uiOrigin=trip_search_Hotels&source=trip_search_Hotels&startTime=1516800919604&searchSessionId=BA939B3D93510DABB510328CBF3353131516800881576ssid&nearPages=true

[ { "urls":[ { "url_type":"hotel", "name":"Sadaf Hotel, Dubai, United Arab Emirates", "type":"HOTEL", "url":"\/Hotel_Review-g295424-d633008-Reviews-Sadaf_Hotel-Dubai_Emirate_of_Dubai.html" } ], . . .

1条回答

网友

1楼 · 发布于 2024-09-24 06:31:02

尝试从URL中删除sessionID，也许可以检查一下您的settings.py有多“不友好”。（另见this blog）

但是使用Wget，比如wget 'https://www.tripadvisor.com/TypeAheadJson?action=API&types=geo%2Cnbrhd%2Chotel%2Ctheme_park&legacy_format=true&urlList=true&strictParent=true&query={}%20dubai%20hotel&max=6&name_depth=3&interleaved=true&scoreThreshold=0.5&strictAnd=false&typeahead1_5=true&disableMaxGroupSize=true&geoBoostFix=true&neighborhood_geos=true&details=true&link_type=hotel%2Cvr%2Ceat%2Cattr&rescue=true&uiOrigin=trip_search_Hotels&source=trip_search_Hotels&startTime=1516800919604&nearPages=true' -O results.json，可能更容易

相关问题更多 >

编程相关推荐

热门问题

热门文章