用Python和javascript表单浏览网站

1条回答

网友

1楼 · 发布于 2024-09-28 20:49:05

如果您通过dev工具监视web流量，您将看到页面为更新内容而进行的API调用。返回的信息是json格式的。你知道吗

例如：第1页

import requests

r = requests.get('https://publons.com/awards/api/2019/hcr/?page=1&per_page=10').json()

您可以在循环中更改page参数以获得所有结果。你知道吗

结果的总数已经在第一个响应中通过r['count']表示了，这样就很容易计算出每页循环10个结果的页面。只是在你提出要求的时候一定要有礼貌。你知道吗

大纲：

import math, requests

with requests.Session() as s:
    r = s.get('https://publons.com/awards/api/2019/hcr/?page=1&per_page=10').json()
    #do something with json. Parse items of interest into list and add to a final list? Convert to dataframe at end?
    number_pages = math.ceil(r['count']/10)

    for page in range(2, number_pages + 1):
        #perhaps have a delay after X requests
        r = s.get(f'https://publons.com/awards/api/2019/hcr/?page={page}&per_page=10').json()
        #do something with json. Parse items of interest into list and add to a final list? Convert to dataframe at end?

相关问题更多 >

编程相关推荐

热门问题

热门文章

用Python和javascript表单浏览网站

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >