标题可能有点混乱,但让我再解释一些。我试图建立一个简单的刮板使用刮板刮银行网站的一些自动预算。到目前为止,似乎我可以得到登录,但之后我立即注销没有得到我需要的数据。以下是我终端的一些文字:
1. 2018-03-27 00:56:56 [scrapy.core.engine] DEBUG: Crawled (200) <POST
https://www.bank.org/signin-page.html> (referer:
https://www.bank.org/signin-page.html)
2. 2018-03-27 00:56:56 [LOG] INFO: LOGIN ATTEMPT SUCCESSFUL
3. 2018-03-27 00:56:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET
https://www.bankonline.org/robots.txt> (referer: None)
4. 2018-03-27 00:56:56 [scrapy.downloadermiddlewares.redirect] DEBUG:
Redirecting (302) to <GET https://www.bankonline.org/tob/live/usp-
core/app/logout?reason=logout> from <GET
https://www.bankonline.org/tob/live/usp-core/app/home>
5. 2018-03-27 00:56:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET
https://www.bankonline.org/tob/live/usp-core/app/logout?
reason=logout> (referer: https://www.bank.org/signin-page.html)
6. 2018-03-27 00:56:56 [LOG] INFO: VISITED
https://www.bankonline.org/tob/live/usp-core/app/logout?
reason=logout
7. 2018-03-27 00:56:57 [scrapy.core.engine] INFO: Closing spider
(finished)
第四行是它开始重定向我的地方。这是我的密码:
import scrapy
import logging
logger = logging.getLogger('LOG')
USERNAME = 'user'
PASSWORD = 'pass'
class Budget_Bank(scrapy.Spider):
name = "Budget_Bank"
login_url = 'https://www.bank.org/signin-page.html'
start_urls = ['https://www.bank.org/signin-page.html']
def parse(self, response):
yield scrapy.FormRequest(url=self.login_url,
formdata={'username': USERNAME,
'password': PASSWORD},
callback=self.login_test)
def login_test(self, response):
if 'errors' in response.text:
logger.warning("LOGIN ATTEMPT FAILED")
return
else:
logger.info("LOGIN ATTEMPT SUCCESSFUL")
yield scrapy.Request('https://www.bankonline.org'
'/tob/live/usp-core/app/home',
callback=self.parse_number)
def parse_number(self, response):
logger.info("VISITED %s", response.url)
for number in response.css('div._1qtcLoK1d4PZmeghcgyE2K'):
yield {
'num': number.css('span.formattedMoney_balanceBZozG-'
...::text').extract_first(),
}
到目前为止,我只是想从网站上获取一个数字,以测试我是否真的可以检索数据。我的登录测试返回我正确登录,但是它没有继续到主页,而是将我重定向到注销。我省略了一些信息,如我的用户名和密码的明显原因,而且我改变了网站的名称。如果能帮上点忙,我将不胜感激。你知道吗
你被重定向到注销,因为它检测到你是一个机器人。你知道吗
您可以尝试将
ROBOTSTXT_OBEY
设置为False
有关更多信息,请参见Doc
相关问题 更多 >
编程相关推荐