如何获取Python Scrapy上的文本

2024-09-30 08:31:00 发布

您现在位置:Python中文网/ 问答频道 /正文

import scrapy


class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.japandict.com']
    start_urls = ['https://www.japandict.com/lists/jlpt5k']
    
           
    def parse(self, response):
        kanjiler = response.xpath("//div[@class='row']/div/div/div")
        for kanji in kanjiler:
            kanjiicon= kanji.xpath("//div[@class='row']/div/div/div/a/div/span")
            yield{
                'kanjiicon': kanjiicon
            }

我就这样创造了蜘蛛。我想把kanjiicon作为文本。但是当我使用.get.extract方法时,其返回值为空。
我怎样才能解决这个问题


Tags: importdivcomresponsewwwxpathclassspider
2条回答

我得到了输出

代码:

import scrapy


class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.japandict.com']
    start_urls = ['https://www.japandict.com/lists/jlpt5k']
    
           
    def parse(self, response):
        kanjiler = response.xpath('//*[@class="d-inline-block w-100 text-muted"]')
        for kanji in kanjiler:
            kanjiicon= kanji.xpath('.//*[@class="xlarge text-normal me-4"]/text()').get().replace('\n','').strip()
            
            yield {
                'kanjiicon': kanjiicon
            }

输出:

{'kanjiicon': '右'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '雨'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '円'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '下'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '何'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '火'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '外'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '学'}
2021-08-22 05:58:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.japandict.com/lists/jlpt5k>
{'kanjiicon': '間'}

您需要将字符串解码为utf-8,ascii不包含日文字符

尝试以下方法:

kanjiicon = kanjiicon.decode('utf-8')

相关问题 更多 >

    热门问题