Python+Scrapy+JSON+XPath:如何使用Scrapy获取JSON数据

2024-09-28 03:13:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我知道如何用Scrapy获取HTML数据点的xpath。但是我必须从这个网站上获取这个页面的所有url(起始url),它们是用JSON格式编写的:

https://highape.com/bangalore/all-events

查看源:https://highape.com/bangalore/all-events

我通常是这样写的:

def parse(self, response):
      events = response.xpath('**What To Write Here?**').extract()

      for event in events:
          absolute_url = response.urljoin(event)
          yield Request(absolute_url, callback = self.parse_event)

请告诉我在这里写什么部分。在

enter image description here


Tags: httpsselfcomeventurlparseresponsehtml
2条回答

What to write here?

events = response.xpath("//script[@type='application/ld+json']").extract()
events = json.loads(events[0])

查看url的页面源代码,然后复制第76-9045行并另存为数据.json在本地驱动器中,然后使用此代码。。。在

import json
from bs4 import BeautifulSoup
import requests
req = requests.get('https://highape.com/bangalore/all-events')
soup = BeautifulSoup(req.content, 'html.parser')
js = soup.find_all('script')[5].text
data = json.loads(js, strict=False)
for i in data:
    url = i['url']
    print(url)
    ##callback with scrapy

相关问题 更多 >

    热门问题