擅长:python、mysql、java
<p>Python对此有一个<a href="http://docs.python.org/library/htmlparser.html" rel="nofollow noreferrer">HTMLParser</a>模块。</p>
<p>这里有一些代码可以满足您的需要:</p>
<pre><code>from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "<%s>"%tag
def handle_endtag(self, tag):
print "</%s>"%tag
parser = MyHTMLParser();
parser.feed("""<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
""")
</code></pre>
<p>在<code>parser.feed</code>中输入字符串</p>
<p>输出:</p>
<pre><code>$ python htmlparser.py
<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>
</code></pre>
<p>关于SO的讨论应该有助于:<a href="https://stackoverflow.com/questions/4182521/using-htmlparser-in-python-efficiently">Using HTMLParser in Python efficiently</a></p>