如何从响应usearpy获取XML标记中的注释？

2条回答

网友

1楼 · 编辑于 2024-09-19 23:34:58

默认情况下，lxml将删除cdata，不幸的是，parsel.Selectorscray正在使用的并没有公开该选项。在

因此，您需要手动创建lxml树，然后重新创建选择器：

$ scrapy shell "https://www.nhaccuatui.com/flash/xml?html5=true&key1=59f0ae8a89cea4a0eb2c3b7e40208f26"
from lxml.etree import XMLParser
from parsel import Selector

# lets fix selector
parser = XMLParser(strip_cdata=False)
root = etree.fromstring(response.body, parser=parser, base_url=response.url)
selector = Selector(root=root)

# now finding CDATA values
selector.xpath('//lyric/text()').extract()
[OUT]: ['https://lrc-nct.nixcdn.com/2018/02/07/a/a/e/f/1517979335534.lrc']

网友

2楼 · 编辑于 2024-09-19 23:34:58

是否要获取链接（'https://lrc-nct.nixcdn.com/2018/02/21/f/b/1/1/1519207822262.lrc'）？可以将xml内容转换为字符串并使用正则表达式提取链接。在

re= re.findall('<lyric><!\[CDATA\[(.*)\]\]></lyric>',XMLstring)

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从响应usearpy获取XML标记中的注释？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >