擅长:python、mysql、java
<p>以干净的方式删除指定的标记和注释。感谢<a href="https://stackoverflow.com/users/7016945/kim-hyesung">Kim Hyesung</a>对<a href="https://stackoverflow.com/questions/40529848/python-beautifulsoup-how-to-write-the-output-to-html-file">this code</a>的支持。</p>
<pre><code>from bs4 import BeautifulSoup
from bs4 import Comment
def cleanMe(html):
soup = BeautifulSoup(html, "html5lib")
[x.extract() for x in soup.find_all('script')]
[x.extract() for x in soup.find_all('style')]
[x.extract() for x in soup.find_all('meta')]
[x.extract() for x in soup.find_all('noscript')]
[x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))]
return soup
</code></pre>