如何在scrapy中刮取json api

2024-10-01 02:35:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用scrapy刮取API。我尝试了各种方法,但都没有做到,但我成功地在anaconda中以json的形式返回了api!但我似乎不知道如何解析它

json代码如下所示:

[
{
  "id": "6975526564553428229",
  "secretID": "6975526564553428229",
  "text": "I did a thing 👀 Pre-order link in my bio!! #WillTheBook",
  "createTime": 1624116353,
  "authorMeta": {
  "id": "6727327145951183878",
  "secUid": "MS4wLjABAAAA8ezUaW4ecJX222ObGXxt07F9BIh4QH3-g1P1DHyChT2LLi2cn-vAE2R53- 
  H672ZO",
  "name": "willsmith",
  "nickName": "Will Smith",
  "verified": true,
  "signature": "Same kid from West Philly.",
  "avatar": "https://p16-sign-va.tiktokcdn.com/musically-maliva- 
  obj/1646315618666501~c5_1080x1080.jpeg?x-expires=1624215600&x- 
  signature=JWCnkyJ1Lq7G6K3W32nSB4NKc%2Fk%3D",
  "following": 24,
  "fans": 55900000,
  "heart": 340900000,
  "video": 81,
  "digg": 88
  },

在阅读了一些教程之后,我所做的是:

# -*- coding: utf-8 -*-
import requests
import scrapy
import json
from pprint import pprint
from ..items import TiktokscrapyItem
from scrapy.crawler import CrawlerProcess
from datetime import datetime

def send_request():
   response = requests.get(
       url="https://app.scrapingbee.com/api/v1/store/tiktok/user-feed",
       params={
        "api_key": "api key hiudden",
        "username": "willsmith",
    },

)
print('Response HTTP Status Code: ', response.status_code)
print('Response HTTP Response Body: ', response.content)
send_request()

class tiktokSpider(scrapy.Spider):
   name = 'tiktok'
   allowed_domains = ['app.scrapingbee.com']
   custom_settings = {'CONCURRENT_REQUESTS_PER_DOMAIN': 10}
   custom_settings = {'FEEDS':{'poststoday.csv':{'format':'csv'}}}


def parse(self, response):
     authorMeta = json.loads(response.body_as_unicode())
     print(authorMeta)
     
     #main driver
if __name__ == "__main__":
   process = CrawlerProcess()
   process.crawl(tiktokSpider)
   process.start()

我不知道该怎么办,我只想抓取文本、createtime和昵称,但想不出来!有什么建议吗


Tags: namefromimportcomapiidjsonresponse