通过CSS查询提取特定数据并不容易

import scrapy class Brandon251Spider(scrapy.Spider): name = "Brandon251" def start_requests(self): urls = [ "https://www.251brandon.com/floorplans" ] for url in urls: yield scrapy.Request(url=url, callback=self.parse) def parse(self, response): price = response.css('.fp-price').extract() yield { 'test': price }

2条回答

网友

1楼 · 编辑于 2024-09-21 07:54:20

您可以尝试使用xpath而不是选择器： response.xpath('//*[@id="floorplan"]/text()')

另请看：https://doc.scrapy.org/en/latest/topics/selectors.html

如果@Casper是正确的，并且特定的元素是由javascript加载的，那么您应该检查scrapy splash（https://github.com/scrapy-plugins/scrapy-splash），这将使您能够加载javascript并在以后刮取页面。祝你好运

网友

2楼 · 编辑于 2024-09-21 07:54:20

@Casper是正确的页面是用Javascript生成的。如果您尝试在禁用javascript的情况下在浏览器中加载页面，则内容将不可见。然而，当一个页面被javascript加载时，您需要的数据通常是JSON格式的。我在网络响应中搜索sqr-ft的一个值，发现数据都是用pageData变量加载的

如果搜索页面的源代码，您会发现JSON对象是用页面的数据定义的，可以用来构建页面

var pageData = {
  filters: {
    beds: [],
    baths: 0,
    priceRange: {
  low: 0,
  high: 9999
},
sqftRange: {
  low: 0,
  high: 9999
},
availableDate: "all",
amenities: []
  },
  hasImages: true,
  amenities: {
am_0: "Built in USB Ports",
am_1: "Designer Carpeting and Two-Tone Paint",
am_2: "Dishwasher",
am_3: "Double Stainless Steel Sinks",
am_4: "Gas Range",
am_5: "Granite Countertops",
am_6: "Large Patio Or Balcony",
am_7: "Linen Closet",
am_8: "Platinum Silver Kitchen Appliances",
am_9: "Pre-Wired For Technology",
am_10: "Spacious Closets",
am_11: "Stackable Washer/Dryer",
am_12: "Wood Blinds"
  },
  floorplans: [
    {
  id: 2029996,
  name: "1 Bed 1 Bath | 1B",
  amenities: [],
  sqft: 594,
  beds: 1,
  baths: 1.0,
  lowPrice: 2392,
  highPrice: 4208,
  availableCount: 1,
  availableDate: "10/8/2018",
  special: false,
  images: [
    {
      src: "/dmslivecafe/3/234323/1B.png?quality=85",
      alt: "",
      title: "1 Bed 1 Bath | 1B",
      caption: ""
    }
  ],

相关问题更多 >

编程相关推荐

热门问题

热门文章