我正在努力学习登录我的蜘蛛。为此,我创建了附加的代码。预期结果是:
{
"username": "willingc",
"email": "carolcode@willingconsulting.com",
"url": "https://www.willingconsulting.com",
}
然而,实际结果是:
{
"username": "willingc",
"email": None,
"url": "https://www.willingconsulting.com",
}
None
通常在浏览器未登录时发生。你看到我的代码中有错误吗?我看到的唯一一个错误的指标是以下警告:
WARNING:py.warnings:/workspace/.pip-modules/lib/python3.8/site-packages/scrapy/spidermiddlewares/referer.py:287: RuntimeWarning: Could not load referrer policy 'origin-when-cross-origin, strict-origin-when-cross-origin'
import scrapy
from scrapy.http import FormRequest
class GitHubSpider(scrapy.Spider):
name = "github"
allowed_domains = ["github.com"]
start_urls = ["https://github.com/login"]
def parse(self, response):
token = response.xpath('//form/input[@name="authenticity_token"]/@value').get()
return FormRequest.from_response(
response,
formdata={
"authenticity_token": token,
"login": "mygithub@gmail.com",
"password": "12345",
},
callback=self.parse_after_login,
)
def parse_after_login(self, response):
yield scrapy.Request(
url="https://github.com/willingc",
callback=self.parse_engineer,
)
def parse_engineer(self, response):
yield {
"username": response.css(".vcard-username::text").get().strip(),
"email": response.xpath('//li[@itemprop="email"]/a//text()').get(),
"url": response.xpath('//li[@itemprop="url"]/a//@href').get(),
}
目前没有回答
相关问题 更多 >
编程相关推荐