擅长:python、mysql、java
<p>请尝试以下代码:</p>
<pre><code>import mechanize
from bs4 import BeautifulSoup
url = "http://www.thehindu.com/archive/web/2010/06/19/"
br = mechanize.Browser()
htmltext = br.open(url).read()
articletext = ""
for tag_li in soup.findAll('li', attrs={"data-section":"Op-Ed"}):
for link in tag_li.findAll('a'):
urlnew = urlnew = link.get('href')
brnew = mechanize.Browser()
htmltextnew = brnew.open(urlnew).read()
articletext = ""
soupnew = BeautifulSoup(htmltextnew)
for tag in soupnew.findAll('p'):
articletext += tag.text
print re.sub('\s+', ' ', articletext, flags=re.M)
driver.close()
</code></pre>
<p>对于<code>re</code>,您可能需要导入<code>re</code>模块。在</p>