AJAX问题(网络爬取)- 需要建议 :)

2024-06-16 22:06:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试替换此网站->https://www.techinasia.com/companies

当查看该站点正在进行的XHR调用时,很明显该站点正在通过AJAX调用从以下API获取结果:

https://219wx3mpv4-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%203.30.0%3BJS%20Helper%202.26.1&x-algolia-application-id=219WX3MPV4&x-algolia-api-key=b528008a75dc1c4402bfe0d8db8b3f8e

然而,当我访问这个网址时,我看到的是:

{"message":"indexName is not valid","status":400}

我很确定它与请求头、查询字符串参数和表单数据有关->As seen in this screenshot.

我只是想知道如何在我的代码中使用这些数据

我试过以下方法:

import requests
from bs4 import BeautifulSoup

def create_dictionary():
    url = r"https://219wx3mpv4-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%203.30.0%3BJS%20Helper%202.26.1&x-algolia-application-id=219WX3MPV4&x-algolia-api-key=b528008a75dc1c4402bfe0d8db8b3f8e"
    session = requests.Session()
    session.get("https://www.techinasia.com/companies")

    headers = {
        "Content-Type"      :   "application/x-www-form-urlencoded",
        "Accept"            :   "application/json",
        "Accept-Encoding"   :   "gzip, deflate, br",
        "User-Agent"        :   "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Firefox/60.0"}

    response = session.post(url, headers=headers).json() 
    return(response)

但此函数仅返回以下内容:

{'message': 'No content in POST request', 'status': 400}

有什么建议吗

非常感谢


Tags: httpscomnetapplication站点sessionwwwagent
1条回答
网友
1楼 · 发布于 2024-06-16 22:06:57

不是标题。可以作为字符串发送

import requests

data = '{"requests":[{"indexName":"companies","params":"query=&hitsPerPage=20&maxValuesPerFacet=1000&page=0&facets=%5B%22*%22%2C%22entity_locations.country_name%22%2C%22entity_industries.vertical_name%22%2C%22funding_stages.stage_name%22%2C%22employee_count%22%2C%22job_posting_count%22%5D&tagFilters="}]}'
r= requests.post('https://219wx3mpv4-2.algolianet.com/1/indexes/*/queries?x-algolia-agent=Algolia for vanilla JavaScript 3.30.0;JS Helper 2.26.1&x-algolia-application-id=219WX3MPV4&x-algolia-api-key=b528008a75dc1c4402bfe0d8db8b3f8e',data=data)
print(r.json())

相关问题 更多 >