在ajax站点中查找网页编号以进行网页抓取

import requests from bs4 import BeautifulSoup response = requests.get("https://ihome.ir/sell-residential-apartment/th-tehran") soup = BeautifulSoup(response.json(), "html.parser") prices = soup.select('.sell-value') titles = soup.select('.title') homes_prices = [] for home in prices: homes_prices.append(int(''.join(filter(str.isdigit, home.getText())))) homes_titles = [] for title in titles: homes_titles.append(title.getText()) res = dict(zip(homes_titles, homes_prices)) for key, value in res.items(): p = str(res[key]) if len(str(res[key])) <= 2: p += '000000000' if len(str(res[key])) > 2: p += '000000' print(key.strip(), int(p))

1条回答

网友

1楼 · 发布于 2024-09-28 01:25:16

没有必要使用BeautifulSoup作为您正在寻找的data。已在JSON目录中显示

这里是Back-EndAPI，从中获取数据

当您查看scrape{}页以及包含24项的每一页时

所以它是24 * 20=480，所以我将每页的结果调整为480，并调用API一次，比在页面上循环多次要好

现在你有了一个JSON目录，你可以访问和提取你想要的任何东西

import requests


params = {
    'is_sale': '1',
    'source': 'website',
    'paginate': '480',
    'page': '1',
    'locations[]': 'iran.th.tehran',
    'property_type[]': 'residential-apartment'
}


def main(url):
    r = requests.get(url, params=params).json()
    for item in r['data']:
        print(item.keys())


main("https://scorpion.ihome.ir/v1/flatted-properties")

相关问题更多 >

编程相关推荐

热门问题

热门文章