<p>您可以使用htql,例如:</p>
<pre><code>html = """
<title>Moving Average Filters</title>
<link href="new/css/default.css" rel="stylesheet" type="text/css" />
<script type='text/javascript' src='new/js/jquery-1.5.js'></script>
<script type='text/javascript' src='new/js/jquery.droppy.js'></script>
<link rel="stylesheet" href="new/css/droppy.css" type="text/css" />
<div id="footer">
<a href="index.html">Home</a> | <a href="pdfbook.htm">The Book by Chapters</a> | <a href="about.htm">About the Book</a> | <a href="swsmith.htm">Steven W. Smith</a> | <a href="http://www.dsprelated.com/blogs-1/nf/Steve_Smith.php">Blog</a> | <a href="http://www.dspguide.com/contact.htm">Contact</a>
<br />
Copyright 1997-2011 by California Technical Publishing
</div>
"""
import htql
x=htql.query(html, "<script> &delete <div norecur (id='footer')>&delete")[0][0]
</code></pre>
<p>你会得到:</p>
<pre><code>>>> x
'\n<title>Moving Average Filters</title>\n<link href="new/css/default.css" rel="stylesheet" type="text/css" />\n\n\n\n<link rel="stylesheet" href="new/css/droppy.css" type="text/css" />\n\n\n'
</code></pre>
<p>要转换目录dir1中的html文件并将其保存到目录dir2,可以创建如下函数:</p>
<pre><code>import htql
def convert(filename, dir1, dir2):
html = open(os.path.join(dir1, filename), 'r').read()
x=htql.query(html, "<script> &delete <div norecur (id='footer')>&delete")[0][0]
open(os.path.join(dir2, filename), 'w').write(x)
</code></pre>
<p>然后,要转换dir1中的所有文件,可以使用循环:</p>
<pre><code>import os
for filename in os.listdir(dir1):
if filename.endswith('.html'):
convert(filename, dir1, dir2)
</code></pre>