我正在努力搜寻speeches-usa.com的标题链接。下面是我的Python代码:
SPEECH_SOURCE = 'http://www.speeches-usa.com/'
def get_speeches():
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
p = opener.open(SPEECH_SOURCE)
soup = BeautifulSoup(p.read(), PARSER_TYPE)
info = soup.find_all('a', class_='ListText')
elements = []
for element in info:
elements.append(element)
for i in x range(0, min(len(elements), 5)):
print elements[i]
(1)我不确定要在soup中放入什么。find \u all()参数可以获取链接-我尝试放入elements.append(element.get \u text()),但这样会产生以下结果,从而删除链接
John Adams - Inaugural
Address
Samuel Adams - American
Independence
Spiro Agnew - Television
News Coverage
Susan B. Anthony - Women's
Right to Vote
(2)结果似乎不完整,例如,下面的代码中缺少Jane Adams
<a class="ListText" href="Transcripts/john_adams-inaugural.html">John Adams - Inaugural
Address<br/>
</a>
0
<a class="ListText" href="Transcripts/samuel_adams-independence.html">Samuel Adams - American
Independence<br/>
</a>
1
<a class="ListText" href="Transcripts/spiro_agnew-networknews.html">Spiro Agnew - Television
News Coverage<br/>
</a>
2
<a class="ListText" href="Transcripts/susan_b_anthony-vote.html">Susan B. Anthony - Women's
Right to Vote</a>
3
<a class="ListText" href="Transcripts/spiro_agnew-networknews.html"></a>
4
帮助和指导将不胜感激
试试这个
以下内容应提供完整的URL:
element.get_text()
完全按照它所说的做—它获取元素的文本。如果需要属性,可以使用方括号,如element['href']
EDIT:下面的注释指出,这遗漏了一些元素,因为并非所有链接都有
ListText
类。下面的代码将查找所有链接,检查'Transcripts'
是否在提供的链接中(我假设您需要的是指向转录本的链接),如果是,则将其附加到列表中。这可能具有重复的特性,因此set()
仅用于打印唯一的条目输出:
相关问题 更多 >
编程相关推荐