<p>在4.4(2015年7月发布)之前的版本中,BeautifulSoup没有本地克隆功能;您必须自己创建一个深度副本,这很棘手,因为每个元素都维护到树的其余部分的链接。</p>
<p>要克隆一个元素及其所有元素,必须复制所有属性并重置它们的父子关系;这必须递归发生。最好不要复制关系属性并重新放置每个递归克隆的元素:</p>
<pre><code>from bs4 import Tag, NavigableString
def clone(el):
if isinstance(el, NavigableString):
return type(el)(el)
copy = Tag(None, el.builder, el.name, el.namespace, el.nsprefix)
# work around bug where there is no builder set
# https://bugs.launchpad.net/beautifulsoup/+bug/1307471
copy.attrs = dict(el.attrs)
for attr in ('can_be_empty_element', 'hidden'):
setattr(copy, attr, getattr(el, attr))
for child in el.contents:
copy.append(clone(child))
return copy
</code></pre>
<p>这个方法对当前的BeautifulSoup版本有点敏感;我用4.3测试了这个,将来的版本可能也会添加需要复制的属性。</p>
<p>您也可以将此功能添加到美化组:</p>
<pre><code>from bs4 import Tag, NavigableString
def tag_clone(self):
copy = type(self)(None, self.builder, self.name, self.namespace,
self.nsprefix)
# work around bug where there is no builder set
# https://bugs.launchpad.net/beautifulsoup/+bug/1307471
copy.attrs = dict(self.attrs)
for attr in ('can_be_empty_element', 'hidden'):
setattr(copy, attr, getattr(self, attr))
for child in self.contents:
copy.append(child.clone())
return copy
Tag.clone = tag_clone
NavigableString.clone = lambda self: type(self)(self)
</code></pre>
<p>允许直接对元素调用<code>.clone()</code>:</p>
<pre><code>document2.body.append(document1.find('div', id_='someid').clone())
</code></pre>
<p>我的<a href="https://bugs.launchpad.net/beautifulsoup/+bug/1307490">feature request</a>到BeautifulSoup项目<a href="http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/revision/380">was accepted and tweaked</a>使用<a href="https://docs.python.org/2/library/copy.html#copy.copy">^{<cd2>} function</a>;既然BeautifulSoup 4.4发布了,您可以使用该版本(或更新版本)并执行以下操作:</p>
<pre><code>import copy
document2.body.append(copy.copy(document1.find('div', id_='someid')))
</code></pre>