发痒的响应.css/带有损坏的HTML的xpath。有什么提示吗？ - 问答 - Python中文网

发痒的响应.css/带有损坏的HTML的xpath。有什么提示吗？

2024-09-30 08:16:03 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我仍在学习scrapy，并试图从这个页面中获取一些信息：Schlotzskys store

但是，在用scray通过scrapy shell解析页面之后，我遇到了一些问题，特别是在解析站点上的地址时。在

首先，我在shell中运行以下内容：

pipenv run scrapy shell https://www.schlotzskys.com/find-your-schlotzskys/arkansas/fayetteville/2146/

结果一切都很好。然后我试着把地址擦掉。我尝试了以下几种方法：

^{pr2}$

上面的前两个输入返回：

[]

第二个输入返回：

Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' locations-address ')]/text()" data='\n\t\t\t\t\t131 N. McPherson Church Rd.\t\t\t\t'

或者是它的变体。在

现在我从以下位置查看HTML：

print(response.text)

我感兴趣的HTML确实出现了，但似乎没有解析成碎片。似乎它可能是损坏的HTML，我想知道有没有办法解决这个问题？在

我非常感谢任何人的帮助！在

Tags： store run text 信息站点地址 html pipenv

1条回答

网友

1楼 · 发布于 2024-09-30 08:16:03

我无法通过第一个表达式中给定的CSS选择器在页面上找到元素。所有表达式都缺少extract()或extract_first()方法调用，因此您使用的是Selectors

试试这个：

address = [
    response.xpath('normalize-space(//div[@class="locations-address"])').extract_first(),
    response.xpath('normalize-space(//div[@class="locations-address-secondary"])').extract_first(),
    response.xpath('normalize-space(//div[@class="locations-state-city-zip"])').extract_first()
]

normalize-space()XPath函数删除恼人的空白。在

相关问题更多 >

编程相关推荐

热门问题

热门文章