提取HTML页面中的链接问题的回答

提取HTML页面中的链接

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试从这里获取所有电影/节目netflix链接<a href="http://netflixukvsusa.netflixable.com/2016/07/complete-alphabetical-list-k-sat-jul-9.html" rel="nofollow">http://netflixukvsusa.netflixable.com/2016/07/complete-alphabetical-list-k-sat-jul-9.html</a>，以及它们的国家名称。e、从网页源代码g，我想<a href="http://www.netflix.com/WiMovie/80048948" rel="nofollow">http://www.netflix.com/WiMovie/80048948</a>，美国等，我做了以下。但它会返回所有链接，而不是我想要的netflix链接。我对regex有点陌生。我该怎么办？你知道吗 <pre><code>from BeautifulSoup import BeautifulSoup import urllib2 import re html_page = urllib2.urlopen('http://netflixukvsusa.netflixable.com/2016/07/complete-alphabetical-list-k-sat-jul-9.html') soup = BeautifulSoup(html_page) for link in soup.findAll('a'): ##reqlink = re.search('netflix',link.get('href')) ##if reqlink: print link.get('href') for link in soup.findAll('img'): if link.get('alt') == 'UK' or link.get('alt') == 'USA': print link.get('alt') </code></pre> 如果取消对上述行的注释，则会出现以下错误： <blockquote> TypeError: expected string or buffer </blockquote> 我该怎么办？你知道吗 <pre><code>from BeautifulSoup import BeautifulSoup import urllib2 import re import requests url = 'http://netflixukvsusa.netflixable.com/2016/07/complete-alphabetical-list-k-sat-jul-9.html' r = requests.get(url, stream=True) count = 1 title=[] country=[] for line in r.iter_lines(): if count == 746: urllib2.urlopen('http://netflixukvsusa.netflixable.com/2016/07/complete-alphabetical-list-k-sat-jul-9.html') soup = BeautifulSoup(line) for link in soup.findAll('a', href=re.compile('netflix')): title.append(link.get('href')) for link in soup.findAll('img'): print link.get('alt') country.append(link.get('alt')) count = count + 1 print len(title), len(country) </code></pre> 上一个错误已被处理。现在唯一要找的就是多个国家的电影。如何让他们聚在一起。 e、 g.对于10.0级地震，link=<a href="http://www.netflix.com/WiMovie/80049286" rel="nofollow">http://www.netflix.com/WiMovie/80049286</a>，country=UK，USA

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

提取HTML页面中的链接

1 个回答

相关Python问题