怎么把这本书的书页都废了

2024-07-02 12:27:18 发布

您现在位置：Python中文网/ 问答频道 /正文

8500

网友

男 | 程序猿一只，喜欢编程写python代码。

我想删除这个链接的所有页面，http://www.jobisjob.co.uk/search?directUserSearch=true&whatInSearchBox=&whereInSearchBox=london。在

我试过不同的方法，但没有找到解决办法。在

下面是我的代码

import scrapy

    class jobisjobSpider(scrapy.Spider):

        enter code here
        name = 'jobisjob'
        allowed_domains = ['jobisjob.co.uk']

        start_urls = ['http://www.jobisjob.co.uk/search?directUserSearch=true&whatInSearchBox=&whereInSearchBox=london']


        def parse(self, response):

            for sel in response.xpath('//div[@id="ajax-results"]/div[@class="offer_list "]/div[@class="box_offer"]/div[@class="offer"]'):

                item = JobgoItem()
                item['title'] = sel.xpath('strong[@class="title"]/a/text()').extract()
                item['description'] = sel.xpath('p[@class="description"]/text()').extract()
                item['company'] = sel.xpath('p[@class="company"]/span[@itemprop="hiringOrganization"]/a[@itemprop="name"]/text()').extract()
                item['location'] = sel.xpath('p[@class="company"]/span/span[@class="location"]/span/text()').extract()


                yield item

            next_page = response.css("div.wrap paginator results > ul > li > a[rel='nofollow']::attr('href')")
            if next_page:

                url = response.urljoin(next_page[0].extract())
                print "next page: " + str(url)

                yield scrapy.Request(url)

有谁能帮我解决这个问题吗？我对python完全陌生

Tags： text div response page extract item xpath class

1条回答

网友

1楼 · 发布于 2024-07-02 12:27:18

下一页选择器出错。您当前的选择器正在搜索名为wrap的标记，然后在div内使用类wrap搜索{}。在

右选择器是

div.wrap.paginator.results > ul > li > a:last-child[rel='nofollow']::attr('href')

怎么把这本书的书页都废了

相关问题更多 >

编程相关推荐

热门问题

热门文章

怎么把这本书的书页都废了

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >