Python:requests.get获取错误的html文件

#Set keywords for URL keyword_queries = ['lissabon'] startdate = "2007-01-01" enddate = "2007-01-01" #Encode and hit URL for keyword in keyword_queries: html_keyword= urllib.parse.quote_plus(keyword) URL = "https://essentials.swissdox.ch/View/log/index.jsp#&search=true&sortorder=pubDateTime%20desc&formdata=%5B%7B%22name%22%3A%22SEARCH_mltid%22%2C%22value%22%3A%22%22%7D%2C%7B%22name%22%3A%22SEARCH_sc%22%2C%22value%22%3A%22swissdox%22%7D%2C%7B%22name%22%3A%22SEARCH_query%22%2C%22value%22%3A%22" + html_keyword + "%22%7D%2C%7B%22name%22%3A%22SEARCH_exact%22%2C%22value%22%3A%22true%22%7D%2C%7B%22name%22%3A%22dateDropdown%22%2C%22value%22%3A%22-1%22%7D%2C%7B%22name%22%3A%22SEARCH_pubDate_lower%22%2C%22value%22%3A%22" + startdate + "%22%7D%2C%7B%22name%22%3A%22SEARCH_pubDate_upper%22%2C%22value%22%3A%22" + enddate + "%22%7D%2C%7B%22name%22%3A%22SEARCH_tiall%22%2C%22value%22%3A%22%22%7D%2C%7B%22name%22%3A%22SEARCH_source%22%2C%22value%22%3A%22%22%7D%2C%7B%22name%22%3A%22SEARCH_author%22%2C%22value%22%3A%22%22%7D%5D" weburl = urllib.request.urlopen(URL) #Hit the url ua = UserAgent() page = requests.get(URL, {"User-Agent": ua.random}) soup = BeautifulSoup(page.content, "html.parser") results = soup.find('div', class_='documentlist') print(page.content)

1条回答

网友

1楼 · 发布于 2024-10-01 09:25:29

看起来您在url中使用了“#”而不是“？”。通常使用“？”启动查询参数，在键值对之间用“=”指定

使用“#”意味着跳转到页面中的特定部分，在本例中为https://essentials.swissdox.ch/View/log/index.jsp，这是您得到的响应。将“#”更改为“？”似乎会引发关于原始URL上无效字符的错误。确保在percent encoding查询参数时使用有效字符

Wiki - URL Syntax

相关问题更多 >

编程相关推荐

热门问题

热门文章