Scrapy爬虫返回重复值

# -*- coding: utf-8 -*- import scrapy class MainSpider(scrapy.Spider): name = 'main' start_urls = ["https://www.compass.com/agents"] def parse(self, response): regions = response.xpath('//ul[@class="geoLinks-list textIntent-caption1--strong"]/li') for each in regions: region_link = each.xpath('.//a/@href').get() region_name = each.xpath('.//a/text()').get() yield response.follow(url=region_link, callback=self.parse_data, meta={"region_text": region_name}) def parse_data(self, response): region = response.request.meta["region_text"] agents = response.xpath('//div[@class="agentCard-contact"]') for agent in agents: name = agent.xpath('normalize-space(//div[@class="agentCard-contact"]/a/text())').get() profile_link = agent.xpath('//div[@class="agentCard-contact"]/a/@href').get() email = agent.xpath('//a[@class="textIntent-body agentCard-email"]/@href').get() mobile = agent.xpath('//a[@class="textIntent-body agentCard-phone"]/@href').get() yield { "Name": name, "Profile_link": profile_link, "Email": email, "Mobile": mobile, "Region": region, }

1条回答

网友

1楼 · 发布于 2024-10-04 01:36:31

我觉得您的xpath存在问题。使用以下命令更改xpath，然后重试： name = agent.xpath('normalize-space(.//a[@class="textIntent-headline1 agentCard-name"]/text())').get() profile_link = agent.xpath('.//a[@class="textIntent-headline1 agentCard-name"]/@href').get() email = agent.xpath('.//a[@class="textIntent-body agentCard-email"]/@href').get() mobile = agent.xpath('.//a[@class="textIntent-body agentCard-phone"]/@href').get()

相关问题更多 >

编程相关推荐

热门问题

热门文章