python urllib，urllib2如何获得清晰的链接

import urllib2, urllib, cookielib urllib.FancyURLopener.version = 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.3) Gecko/2008092814 (Debian-3.0.1-1)' class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler): def redirect_request(self, req, fb, code, msg, headers, newurl): m = req.get_method() if (code in (301, 302, 303, 307) and m in ('GET', 'HEAD') or code in (301, 302, 303) and m == 'POST'): newurl = newurl.replace(' ', '%20') newheaders = dict((k,v) for k,v in req.headers.items() if k.lower() not in ("content-length", "content-type") ) return urllib2.Request(newurl, headers=newheaders, origin_req_host=req.get_origin_req_host(), unverifiable=True) else: raise HTTPError(req.get_full_url(), code, msg, headers, fp) cj = cookielib.CookieJar() opener = urllib2.build_opener(MyHTTPRedirectHandler, urllib2.HTTPCookieProcessor(cj)) urllib2.install_opener(opener) req = urllib2.Request('http://example.com/goto/#sharplink') response = urllib2.urlopen(req) f=open('bet','w') f.write(response.read()) f.close()

1条回答

网友

1楼 · 发布于 2024-09-30 04:37:17

URL的片段部分（“sharplink”）不会发送到Web服务器（它通常用于定义链接引用的给定网页上的特定节），因此请求http://example.com/goto/还是http://example.com/goto/#sharplink并不重要。你知道吗

如果您希望页面不同，那么站点很可能使用AJAX框架，该框架在URL的片段部分对状态进行编码。由于urllib和friends不执行JS，您需要使用phantomjs之类的工具来获取页面的内容。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章