擅长:python、mysql、java
<p><a href="http://www.crummy.com/software/BeautifulSoup/" rel="noreferrer">Beautiful Soup</a>很好地处理了无效/损坏的HTML</p>
<pre><code>>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("<htm@)($*><body><table <tr><td>hi</tr></td></body><html")
>>> print soup.prettify()
<htm>
<body>
<table>
<tr>
<td>
hi
</td>
</tr>
</table>
</body>
</htm>
</code></pre>