在Windows10上使用Python3。 正在下载无法打开的PDF。136KB而不是721KB。你知道吗
我尝试了三种不同的方法打开PDF并将其写入文件(请参见代码中的#1#、#2#和#3#)。你知道吗
我想知道问题是否在于身份验证。我是新的身份验证,但据我所知,该网站是使用邮政。你知道吗
import requests
downloadurl = "https://pedsinreview.aappublications.org/content/pedsinreview/40/10/e35.full.pdf"
username = 'myusername'
password = 'mypassword'
chunk_size = 1024
payload = {'name': username, 'pass': password}
r = requests.get(downloadurl, data=payload, verify=False, stream=True)
#r.raw.decode_content = True
with open("file_name.pdf", 'wb') as f:
#1#f.write(r.content)
#2#shutil.copyfileobj(r.raw, f)
#3#for chunk in r.iter_content(chunk_size):
#3#if chunk:
#3#f.write(chunk)
我可以打开一个721kb的PDF文件,但是我得到一个136KB的无法读取的文件。你知道吗
事先谢谢你的帮助。你知道吗
更新:
工作!!!!!!!!!!!你知道吗
import requests
loginurl = "https://pedsinreview.aappublications.org/user/login"
downloadurl = "https://pedsinreview.aappublications.org/content/pedsinreview/40/10/e35.full.pdf"
username = 'myusername'
password = 'mypassword'
chunk_size = 1024
#r = requests.get(downloadurl, data=payload, verify=False, stream=True)
# Do everything with the context of the session
with requests.Session() as session:
data = {
'form_id': 'user_login',
'name': username,
'pass': password
}
login_request = session.post(loginurl, data=data)
print(login_request.status_code) # returns 200, I think it should be 302 because
#that's what it shows when I login successfully in browser vs. 200 when I use a
#wrong password.
# Now you are logged in and should be able to request the pdf
r = session.get(downloadurl)
with open("file_name.pdf", 'wb') as f:
for chunk in r.iter_content(chunk_size):
if chunk:
f.write(chunk)
你说得对,这是个认证问题。由于您没有登录,服务器会将您重定向到一个html页面,这就是您得到的。你知道吗
所以首先,你要做的是:
相关问题 更多 >
编程相关推荐