擅长:python、mysql、java
<p>当要解析格式不好且复杂的HTML时,<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use" rel="nofollow">the parser choice</a>非常重要:</p>
<blockquote>
<p>There are also differences between HTML parsers. If you give Beautiful
Soup a perfectly-formed HTML document, these differences won’t matter.
One parser will be faster than another, but they’ll all give you a
data structure that looks exactly like the original HTML document.</p>
<p>But if the document is not perfectly-formed, different parsers will
give different results.</p>
</blockquote>
<p><code>html.parser</code>为我工作:</p>
<pre><code>from bs4 import BeautifulSoup
import requests
document = requests.get('http://www.wvdnr.gov/').content
soup = BeautifulSoup(document, "html.parser")
print soup.find_all('a')
</code></pre>
<p>演示:</p>
^{pr2}$
<p>另请参见:</p>
<ul>
<li><a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-between-parsers" rel="nofollow">Differences between parsers</a>。在</li>
</ul>