我试图使用mechanize
对一个网站(http://www.dataescolabrasil.inep.gov.br/dataEscolaBrasil/home.seam)进行爬网,但遇到了一个我无法理解(因此无法解决)的错误。这可能是由于我对web开发知识的贫乏。在
我想做的是:
import mechanize
# this is the website I want to crawl
LINK = "http://www.dataescolabrasil.inep.gov.br/dataEscolaBrasil/home.seam"
br = mechanize.Browser()
br.open(LINK)
request = mechanize.Request(LINK)
response = mechanize.urlopen(request)
# there're two forms in the page (output ommited), I want the second one.
forms = mechanize.ParseResponse(response, backwards_compat=False)
for form in br.forms():
print "Form name:", form.name
print form
br.select_form(nr=1)
br.form['codEntidadeDecorate:codEntidadeInput'] = '11024968'
response2 = br.submit()
下面是我得到的运行时错误:
^{pr2}$我尝试过对传递给表单的字符串进行编码的一些调整,试图理解GET v.POST,但没有成功。在
我从你的例子中找到了第页的表格:
我认为这是个问题,空的enctype属性。您需要将此属性的值设置为application/x-www-form-urlencoded或将其删除以使用默认值。在
相关问题 更多 >
编程相关推荐