使用“请求”（在ipython中）下载pdf文件

import requests headers = {'User-Agent': 'Mozilla/5.0'} url = 'http://imaging.occeweb.com/imaging/UIC1012_1075.aspx' API = '15335187' payload = {'txtIndex7':'1','txtIndex2': API} session = requests.Session() res = session.post(url,headers=headers,data=payload)

2条回答

网友

1楼 · 编辑于 2024-09-30 10:27:13

import mechanicalsoup
import urllib

url = 'http://imaging.occeweb.com/imaging/UIC1012_1075.aspx'
Form = '1012'
API = '15335187'
browser = mechanicalsoup.StatefulBrowser(
  user_agent='Mozilla/5.0'
)
browser.open(url)

# Fill-in the search form
browser.select_form('form#Form1')
browser["txtIndex7"] = Form
browser["txtIndex2"] = API
browser.submit_selected("Button1")

# Display the results
for tr in browser.get_current_page().select('table#DataGrid1 tr')[2:]:
  try:
    pdf_url = tr.select('td')[0].find('a').get('href')
  except:
    print('Pdf not found')
  else:
    pdf_id = tr.select('td')[0].text
    response = urllib.urlopen(pdf_url) # for python 2.7, for python 3. urllib.request.urlopen()
    pdf_str = "C:\\Data\\"+pdf_id+".pdf"
    file = open(pdf_str, 'wb')
    file.write(response.read())
    file.close()
    print('Pdf '+pdf_id+' saved')

网友

2楼 · 编辑于 2024-09-30 10:27:13

它有点复杂，您需要考虑一些额外的事件验证隐藏输入字段。为此，您首先需要获取页面，收集所有隐藏的值，为API设置值，然后通过以下HTML响应的HTML解析发出POST请求。在

幸运的是，有一个名为^{}的工具可以帮助自动填充表单提交请求中的这些隐藏字段。下面是一个完整的解决方案，包括用于解析结果表的示例代码：

import mechanicalsoup


url = 'http://imaging.occeweb.com/imaging/UIC1012_1075.aspx'
API = '15335187'
browser = mechanicalsoup.StatefulBrowser(
    user_agent='Mozilla/5.0'
)
browser.open(url)

# Fill-in the search form
browser.select_form('form#Form1')
browser["txtIndex2"] = API
browser.submit_selected("Button1")

# Display the results
for tr in browser.get_current_page().select('table#DataGrid1 tr'):
    print([td.get_text() for td in tr.find_all("td")])

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用“请求”（在ipython中）下载pdf文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >