lxml获取标签的全部内容，包括子节点和tex

<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>

s= '<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>' d = etree.fromstring(s) title_xpath = '/title-group/article-title' title = "" if not d.xpath(title_xpath)[0].getchildren(): title = d.xpath(title_xpath)[0].text else: for title_elem in d.xpath(title_xpath): title_parts = title_elem.getchildren() title = ''.join(etree.tostring(part, encoding="unicode") for part in title_parts) print(title)

2条回答

网友

1楼 · 编辑于 2024-09-30 01:30:40

可能获取元素并从中提取文本\u content（）

从xml树“d”开始（这只是我的想法，不是很漂亮，但是如果它能满足您的需要，请告诉我）：

text = ""
for element in list(d.iterchildren("title-group")): # iterate over elements with tag = "title-group"
    try:
        text += element.text_content() # get text, placed in a try-except just incase the element doesn't have the text_content() method
    except:
        continue
print(text)

网友

2楼 · 编辑于 2024-09-30 01:30:40

你可以试试BeautifulSoup

>>> s= '<title-group><article-title xml:lang="en">Correction to: Effective adsorptive performance of Fe<sub>3</sub>O<sub>4</sub>@SiO<sub>2</sub>core shell spheres for methylene blue: kinetics, isotherm and mechanism</article-title></title-group>'

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s, 'lxml')
>>> soup.getText()
'Correction to: Effective adsorptive performance of Fe3O4@SiO2core shell spheres for methylene blue: kinetics, isotherm and mechanism'

相关问题更多 >

编程相关推荐

热门问题

热门文章