如何用BeautifulSoup连接两个html文件体？问题的回答

如何用BeautifulSoup连接两个html文件体？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我需要将两个html文件的主体连接到一个html文件中，中间用一点任意的html作为分隔符。我有用于此目的的代码，但当我从Xubuntu 11.10（或是11.04？）升级时停止了工作到12.10，可能是由于BeautifulSoup update（我目前使用的是3.2.1；我不知道以前的版本）或vim更新（我使用vim从纯文本文件自动生成html文件）。这是代码的精简版本： <pre><code>from BeautifulSoup import BeautifulSoup soup_original_1 = BeautifulSoup(''.join(open('test1.html'))) soup_original_2 = BeautifulSoup(''.join(open('test2.html'))) contents_1 = soup_original_1.body.renderContents() contents_2 = soup_original_2.body.renderContents() contents_both = contents_1 + "\nSEPARATOR\n" + contents_2 soup_new = BeautifulSoup(''.join(open('test1.html'))) while len(soup_new.body.contents): soup_new.body.contents[0].extract() soup_new.body.insert(0, contents_both) </code></pre> 用于测试用例的两个输入文件的主体非常简单：<code>contents_1</code>is<code>\n<pre>\nFile 1\n</pre>\n'</code>和<code>contents_2</code>is<code>'\n<pre>\nFile 2\n</pre>\n'</code>。 我希望<code>soup_new.body.renderContents()</code>是这两个文本之间的分隔符文本的连接，但是所有<code><</code>都变成了<code>&lt;</code>等等。-期望的结果是<code>'\n<pre>\nFile 1\n</pre>\n\nSEPARATOR\n\n<pre>\nFile 2\n</pre>\n'</code>，这是我在操作系统更新之前得到的结果；当前的结果是<code>'\n&lt;pre&gt;\nFile 1\n&lt;/pre&gt;\n\n&lt;b&gt;SEPARATOR\n&lt;/b&gt;\n&lt;pre&gt;\nFile 2\n&lt;/pre&gt;\n'</code>，这是非常无用的。 在将html作为字符串插入soup对象的主体时，如何使BeautifulSoup停止将<code><</code>转换为<code>&lt;</code>等？或者我应该用一种完全不同的方式来做？（这是我在BeautifulSoup和大多数其他html解析方面的唯一经验，所以我想很可能就是这样。） html文件是用vim从纯文本文件自动生成的（我使用的实际情况显然更复杂，并且涉及到自定义语法突出显示，这就是我这样做的原因）。完整的test1.html文件如下所示，而test2.html除了内容和标题之外都是相同的。 <pre><code><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>~/programs/lab_notebook_and_printing/concatenate-html_problem_2013/test1.txt.html</title> <meta name="Generator" content="Vim/7.3" /> <meta name="plugin-version" content="vim7.3_v10" /> <meta name="syntax" content="none" /> <meta name="settings" content="ignore_folding,use_css,pre_wrap,expand_tabs,ignore_conceal" /> <style type="text/css"> pre { white-space: pre-wrap; font-family: monospace; color: #000000; background-color: #ffffff; white-space: pre-wrap; word-wrap: break-word } body { font-family: monospace; color: #000000; background-color: #ffffff; font-size: 0.875em } </style> </head> <body> <pre> File 1 </pre> </body> </html> </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何用BeautifulSoup连接两个html文件体？

1 个回答

相关Python问题