scrapysplash返回它自己的头,而不是si的原始头

2024-09-30 22:20:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我用刮花来造我的蜘蛛。现在我需要的是维护会话,所以我使用scrapy.DownloaderMiddle软件.cookies.cookies中间件它处理set cookie头。我知道它处理set cookie头,因为我设置了COOKIES_DEBUG=True,这会导致CookeMiddleware打印出关于set cookie header的信息。在

问题是:当我还向图片添加Splash时,设置的cookie打印输出就会消失,实际上我得到的响应头是 {'Date':['Sun,2016年9月25日12:09:55 GMT'],'内容类型':['text/html;charset=utf-8'],'服务器':['TwistedWeb/16.1.1']} 与使用TwistedWeb的splash渲染引擎有关。在

有没有指令告诉splash也给我原始的响应头?在


Tags: 中间件debugtrue软件cookieheaderscrapycookies
1条回答
网友
1楼 · 发布于 2024-09-30 22:20:24

要获得原始的响应头,可以编写一个Splash Lua script;请参阅scrapy splash自述中的examples

Use a Lua script to get an HTML response with cookies, headers, body and method set to correct values; lua_source argument value is cached on Splash server and is not sent with each request (it requires Splash 2.1+):

import scrapy
from scrapy_splash import SplashRequest

script = """
function main(splash)
  splash:init_cookies(splash.args.cookies)
  assert(splash:go{
    splash.args.url,
    headers=splash.args.headers,
    http_method=splash.args.http_method,
    body=splash.args.body,
    })
  assert(splash:wait(0.5))

  local entries = splash:history()
  local last_response = entries[#entries].response
  return {
    url = splash:url(),
    headers = last_response.headers,
    http_status = last_response.status,
    cookies = splash:get_cookies(),
    html = splash:html(),
  }
end
"""

class MySpider(scrapy.Spider):


    # ...
        yield SplashRequest(url, self.parse_result,
            endpoint='execute',
            cache_args=['lua_source'],
            args={'lua_source': script},
            headers={'X-My-Header': 'value'},
        )

    def parse_result(self, response):
        # here response.body contains result HTML;
        # response.headers are filled with headers from last
        # web page loaded to Splash;
        # cookies from all responses and from JavaScript are collected
        # and put into Set-Cookie response header, so that Scrapy
        # can remember them.

scrapy splash还为cookie处理提供了built-in helpers;在本例中,只要scrapy splash是configured,如自述中所述。在

相关问题 更多 >