我想从下面的XML中获取所有文本内容和标记
<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>
上面的输出应该是
Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism
我试过以下方法,但它给了我不完全的价值
s= '<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>'
d = etree.fromstring(s)
title_xpath = '/title-group/article-title'
title = ""
if not d.xpath(title_xpath)[0].getchildren():
title = d.xpath(title_xpath)[0].text
else:
for title_elem in d.xpath(title_xpath):
title_parts = title_elem.getchildren()
title = ''.join(etree.tostring(part, encoding="unicode") for part in title_parts)
print(title)
上面的代码给了我
<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism
可能获取元素并从中提取文本\u content()
从xml树“d”开始(这只是我的想法,不是很漂亮,但是如果它能满足您的需要,请告诉我):
你可以试试
BeautifulSoup
相关问题 更多 >
编程相关推荐