在URL中使用双引号的Python BeautifulSoup get请求

2024-09-29 20:23:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图让BeautifulSoup阅读此页面,但URL没有正确地传递到get()命令中

URL是https://www.econjobrumors.com/topic/supreme-court-to-%e2%80%9cconsider%e2%80%9d-taking-up-harvard-affirmative-action-case-on-june-10。但是当我尝试使用BeautifulSoup从URL获取数据时,它总是给出一个错误,指出URL不正确

response = requests.get(url = "https://www.econjobrumors.com/topic/supreme-court-to-%e2%80%9cconsider%e2%80%9d-taking-up-harvard-affirmative-action-case-on-june-10",
                            verify = False \
                            )  
print(response.request.url, end="\r")

是双引号“(U+201C)和”(U+201D)导致了错误。我已经尝试了几个小时,但仍然不知道如何正确传递URL


Tags: tohttpscomurlgettopicwwwup
1条回答
网友
1楼 · 发布于 2024-09-29 20:23:52

我将URL周围的双引号改为单引号

from bs4 import BeautifulSoup
import requests

url = 'https://www.econjobrumors.com/topic/supreme-court-to-%e2%80%9cconsider%e2%80%9d-taking-up-harvard-affirmative-action-case-on-june-10'  

r = requests.get(url, allow_redirects=False) 

soup = BeautifulSoup(r.content, 'lxml')  

print(soup)

按预期打印出html,我对其进行了编辑,使其符合此答案


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xml:lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="IE=8" http-equiv="X-UA-Compatible"/>
<ALL THE CONTENT>Too much to paste in the answer</ALL THE CONTENT>

</html>

相关问题 更多 >

    热门问题