当试图计算列表中的元素时出现元组问题？问题的回答

当试图计算列表中的元素时出现元组问题？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我试图统计政客在某些演讲中使用的缩略词的数量。我有很多演讲，但以下是一些URL示例： <pre><code>every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427', 'http://www.millercenter.org/president/obama/speeches/speech-4424', 'http://www.millercenter.org/president/obama/speeches/speech-4453', 'http://www.millercenter.org/president/obama/speeches/speech-4612', 'http://www.millercenter.org/president/obama/speeches/speech-5502'] </code></pre> 我现在有一个非常粗略的计数器-它只计算所有这些链接中使用的收缩总数。例如，下面的代码为上面的五个链接返回<code>79,101,101,182,224</code>。但是，我想链接<code>filename</code>，这是我在下面创建的一个变量，所以我会有类似<code>(speech_1, 79),(speech_2, 22),(speech_3,0),(speech_4,81),(speech_5,42)</code>的东西。这样，我就可以追踪每个语音中使用的收缩次数。我的代码出现以下错误：<code>AttributeError: 'tuple' object has no attribute 'split'</code> 这是我的密码： <pre><code>import urllib2,sys,os from bs4 import BeautifulSoup,NavigableString from string import punctuation as p from multiprocessing import Pool import re, nltk import requests reload(sys) url = 'http://www.millercenter.org/president/speeches' url2 = 'http://www.millercenter.org' conn = urllib2.urlopen(url) html = conn.read() miller_center_soup = BeautifulSoup(html) links = miller_center_soup.find_all('a') linklist = [tag.get('href') for tag in links if tag.get('href') is not None] # remove all items in list that don't contain 'speeches' linkslist = [_ for _ in linklist if re.search('speeches',_)] del linkslist[0:2] # concatenate 'http://www.millercenter.org' with each speech's URL ending every_link_dups = [url2 + end_link for end_link in linkslist] # remove duplicates seen = set() every_link = [] # no duplicates array for l in every_link_dups: if l not in seen: every_link.append(l) seen.add(l) def processURL_short_2(l): open_url = urllib2.urlopen(l).read() item_soup = BeautifulSoup(open_url) item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'}) item_str = item_div.text.lower() splitlink = l.split("/") president = splitlink[4] speech_num = splitlink[-1] filename = "{0}_{1}".format(president, speech_num) return item_str, filename every_link_test = every_link[0:5] print every_link_test count = 0 for l in every_link_test: content_1 = processURL_short_2(l) for word in content_1.split(): word = word.strip(p) if word in contractions: count = count + 1 print count, filename </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

当试图计算列表中的元素时出现元组问题？

1 个回答

相关Python问题