Scrapy即使通过设置UserAgent也无法向下抓取数据,原因是什么?

2024-06-02 10:10:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我在学Scrapy,我想学Scrapy

在我的蜘蛛中:

import scrapy

class TencentHrSpider(scrapy.Spider):
    name = 'tencent_hr'
    allowed_domains = ['careers.tencent.com']
    start_urls = ['http://careers.tencent.com/search.html']

    def parse(self, response):

        div_list = response.xpath('//div[@class="recruit-list"]')

        print(div_list)  # there get `[]`, no data in it.

当我开始爬网时,没有数据输出。 为什么?

我已在settings.py中设置了请求头用户代理:

USER_AGENT_LIST=[
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'
    "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
]
import random
USER_AGENT = random.choice(USER_AGENT_LIST)

编辑-01

有可能找到原因吗?有没有要跟踪的错误日志


编辑-02

为什么如果数据是AJAX从API请求的,Scrapy就无法获取数据?我们知道它可以下载整个页面,是否可以像浏览器一样运行脚本


Tags: divmozillawindowschromelistsafariliketencent