<p>下面的代码对我有用-它创建了一个字典(映射)!在</p>
<pre><code>from bs4 import BeautifulSoup
from collections import defaultdict
import re
d= defaultdict(unicode)
html ='''
<body>
<p>The prognosis of patients with rectal cancer has improved since the introduction of total mesorectal excision (TME) surgery [
<xref ref-type="bibr" rid="CR1">1</xref>&#x02013;
<xref ref-type="bibr" rid="CR3">3</xref>]. Using this surgical technique the mesorectal compartment including the rectum and perirectal fat is completely excised by sharp dissection along the mesorectal fascia (MRF) [
<xref ref-type="bibr" rid="CR1">1</xref>]. Additionally, large randomized trials have shown that neo-adjuvant therapy improves local tumor control even further, regardless of optimized surgical techniques [
<xref ref-type="bibr" rid="CR3">3</xref>,
<xref ref-type="bibr" rid="CR4">4</xref>]. The advances in rectal cancer treatment have provoked differentiated neo-adjuvant treatment strategies based on anatomical preoperative identifiable risk factors for local tumor recurrence as can be visualized with magnetic resonance imaging (MRI) [
<xref ref-type="bibr" rid="CR5">5</xref>]. One of the most important risk factors is the tumor relationship to the MRF, which actually defines the surgical circumferential resection margin (CRM) in TME surgery [
<xref ref-type="bibr" rid="CR6">6</xref>,
<xref ref-type="bibr" rid="CR7">7</xref>]. Long courses of neo-adjuvant chemoradiation have emerged as the preferential treatment of patients with anticipated tumor invasion of the MRF on MRI in order to downstage/downsize the tumor and to obtain tumor free resection margins [
<xref ref-type="bibr" rid="CR5">5</xref>].
</p>
</body>
'''
soup = BeautifulSoup(html,'html.parser')
l = soup.find_all('xref')
for i in l:
e= i.next_element
txt = e.next_element.encode('utf-8')
if re.match(r'\].+\[',txt) is not None:
d[i.attrs['rid'].strip()]=txt.strip()
for k,v in d.items():
print "The value of {0} is>>>>> {1} ".format(k,v)
</code></pre>
<p>它打印-</p>
^{pr2}$