擅长:python、mysql、java
<p>我认为你没有正确初始化HTMLParser。也许你根本不需要初始化它。这对我有用:</p>
<pre><code># -*- coding: utf-8 -*-
from HTMLParser import HTMLParser
import urllib2
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Encountered a start tag:", tag
# retrive the terms
if tag == 'div':
for attribute, value in attrs:
if value == 'word':
self.terms.append(attrs[1][1])
# retrive the definitions
if value == 'desc':
if attrs[1][1]:
self.definitions.append(attrs[1][1])
else:
self.definitions.append(None)
parser = MyHTMLParser()
# open page and retrive source page
response = urllib2.urlopen('http://localhost/')
html = response.read().decode('utf-8')
response.close()
# extract the terms and definitions
parser.feed(html)
</code></pre>
<p><strong>更新</strong></p>
^{pr2}$
<p>输出:</p>
<p>['center','left']</p>
<p>['center','left']</p>