xpath为什么我在这个表达式中得到空结果

2024-05-18 20:54:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我试试这个

.//div[@class='owl-wrapper']

在这个网站上

http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1

但是我得到了一个空的结果,尽管我可以在googlef12开发工具中看到它。你知道吗

您可能认为这是一个javascript调用,但这不是因为,我使用的是scrapy,我可以view这样的响应:

scrapy shell ("website")
view(response)

那个班在那里。你知道吗

请帮帮我

使用view(response)的页面的Chrome屏幕截图

Screenshot from my Chrome for the page that comes using view(response)


Tags: divcomviewhttpsearch网站responsewww
1条回答
网友
1楼 · 发布于 2024-05-18 20:54:10

问题是:包含带有owl-wrapper类的div元素的搜索结果是通过一个额外的GET请求异步加载的。你知道吗

您需要在代码中模拟此请求,例如使用requests

import requests

with requests.Session() as session:
    session.get('http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1')

    params = {
        'url': 'filter__cid/0/sort/score__desc/per_page/20/page/1',
        'ajax': 'true'
    }
    response = session.get('http://www.justproperty.com/search/featured-properties/', params=params)
    results = response.json()

    for result in results:
        print result['description']

印刷品:

2 bedroom unit on high floor. Full Fountain View,It comes with different amenities, facilities and hotel services. It is located in a prime location, The Address Hotel Lake Downtown. This property is...
Large Upgraded 1 Bedroom For Sale In Index Tower DIFC With DIFC ViewSize: 840 square feet - 78 square metersBedroom: 1 Bathroom: 1 plus guest washroomKitchen: Fully Equipped modern style kitchen with...
Spacious and nice 1-bedroom apartment for
...

基于上述解决方案的示例Scrapyspider:

import json

import scrapy


class JustPropertySpider(scrapy.Spider):
    name = "justproperty"
    allowed_domains = ["justproperty.com"]
    start_urls = [
        "http://www.justproperty.com/search/uae/apartments/filter__cid/0/sort/score__desc/per_page/20/page/1"
    ]

    def parse(self, response):
        yield scrapy.Request('http://www.justproperty.com/search/featured-properties/?url=filter__cid/0/sort/score__desc/per_page/20/page/1&ajax=true',
                             callback=self.parse_results,
                             headers={'X-Requested-With': 'XMLHttpRequest'})

    def parse_results(self, response):
        results = json.loads(response.body)

        for result in results:
            print result['description']

相关问题 更多 >

    热门问题