<p>我想运行一个python脚本来解析html文件,并收集一个带有<code>target="_blank"</code>属性的所有链接的列表。在</p>
<p>我试过以下方法,但没有从bs4中得到任何东西。SoupStrainer在文件中说,它将采取与findAll等相同的方式,这个工作吗?我错过了一些愚蠢的错误吗?在</p>
<pre><code>import os
import sys
from bs4 import BeautifulSoup, SoupStrainer
from unipath import Path
def main():
ROOT = Path(os.path.realpath(__file__)).ancestor(3)
src = ROOT.child("src")
templatedir = src.child("templates")
for (dirpath, dirs, files) in os.walk(templatedir):
for path in (Path(dirpath, f) for f in files):
if path.endswith(".html"):
for link in BeautifulSoup(path, parse_only=SoupStrainer(target="_blank")):
print link
if __name__ == "__main__":
sys.exit(main())
</code></pre>
<p>我想你需要这样的东西</p>
<pre><code>if path.endswith(".html"):
htmlfile = open(dirpath)
for link in BeautifulSoup(htmlfile,parse_only=SoupStrainer(target="_blank")):
print link
</code></pre>