URL库打开（）无法处理带有#的字符串？

from bs4 import BeautifulSoup import urllib url = urllib.urlopen("https://www.google.com/") soup = BeautifulSoup(url) parseList1=[] for i in soup.stripped_strings: parseList1.append(i) parseList1 = list(parseList1[10:15]) #Second URL url2 = urllib.urlopen("https://www.google.com/"+"#q=Kerbal Space Program") soup2 = BeautifulSoup(url2) parseList2=[] for i in soup2.stripped_strings: parseList2.append(i) parseList2 = list(parseList2[10:15]) #Third URL url3 = urllib.urlopen("https://www.google.com/#q=Kerbal Space Program") soup3 = BeautifulSoup(url3) parseList3=[] for i in soup3.stripped_strings: parseList3.append(i) parseList3 = list(parseList3[10:15]) print " 1 " for i in parseList1: print i print " 2 " for i in parseList2: print i print " 3 " for i in parseList3: print i

1条回答

网友

1楼 · 发布于 2024-09-30 08:16:23

浏览器不应将url片段部分（以“#”结尾）发送到服务器。你知道吗

RFC 1808 (Relative Uniform Resource Locators) : Note that the fragment identifier (and the "#" that precedes it) is not considered part of the URL. However, since it is commonly used within the same string context as a URL, a parser must be able to recognize the fragment when it is present and set it aside as part of the parsing process.

您可以在浏览器中获得正确的结果，因为浏览器向https://www.google.com发送请求，url片段被javascript检测到（这与拼写检查类似，大多数网站不会这样做），然后浏览器发送一个新的ajax请求（https://www.google.com?q=xxxxx），最后用获得的json数据呈现页面。urllib无法为您执行javascript。你知道吗

要解决您的问题，只需将https://www.google.com/#q=Kerbal Space Program替换为https://www.google.com/?q=Kerbal Space Program

相关问题更多 >

编程相关推荐

热门问题

热门文章