使用python scrapy a websi时无法获取<p>的文本

2条回答

网友

1楼 · 编辑于 2024-10-03 09:09:11

我不太懂xpath。但是正则表达式可以帮助你

这不是很优雅，但会为你工作

>>> import re
>>> html = """
    <p class="fc-gray">
       hello
    <span class="">2010-10</span>
    <em class="shuxian">|</em>
         4.2
    </p>
"""
>>> search = re.search('em>[\n\s]*(?P<result>[\d.]+).*', html, flags=re.DOTALL)
>>> if search:
...     print(search.group('result'))
...     
4.2

网友

2楼 · 编辑于 2024-10-03 09:09:11

您位于<P>元素的最后一个文本子元素之后，因此可以向XPath表达式添加[last()]谓词：

>>> import scrapy
>>> s = scrapy.Selector(text="""      <p class="fc-gray">
...            hello
...        <span class="">2010-10</span> 
...        <em class="shuxian">|</em>
...              4.2                 
...       </p>""")
>>> s.xpath('.//p[@class="fc-gray"]/text()[last()]')
[<Selector xpath='.//p[@class="fc-gray"]/text()[last()]' data='\n             4.2                 \n     '>]
>>> s.xpath('.//p[@class="fc-gray"]/text()[last()]').extract_first()
'\n             4.2                 \n      '
>>> s.xpath('.//p[@class="fc-gray"]/text()[last()]').extract_first().strip()
'4.2'
>>>> # alternative using XPath's normalize-space() to do the whitespace stripping
>>> s.xpath('normalize-space(.//p[@class="fc-gray"]/text()[last()])').extract_first()
'4.2'

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python scrapy a websi时无法获取<p>的文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >