不支持的URL方案:没有可用的Scrapy处理程序

2024-10-04 01:37:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我在scrapy framework中遇到了这个错误。这是我的dmoz.py在蜘蛛目录下:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from dirbot.items import Website


class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    f = open("links.csv")
    start_urls = [url.strip() for url in f.readlines()]
    f.close()
    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//ul/li')
        items = []

        for site in sites:
            item = Website()
            item['name'] = site.select('a/text()').extract()
            item['url'] = site.select('a/@href').extract()
            item['description'] = site.select('text()').extract()
            items.append(item)

        return items

运行此代码时遇到此错误:

^{pr2}$

以下是我的内容链接.csv公司名称:

http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/

在中有80个URL链接.csv. 如何解决此错误?在


Tags: csvfromimporthttpurlwww错误site
1条回答
网友
1楼 · 发布于 2024-10-04 01:37:15

^{} is ^{} urlencoded。您的CSV文件可能有如下行:

"http://example.com/"
  1. 使用^{} module读取文件,或
  2. 剥离"s

编辑:按要求:

^{pr2}$

编辑2:

import csv
from StringIO import StringIO

c = '"foo"\n"bar"\n"baz"\n'      # Since csv.reader needs a file-like-object,
reader = csv.reader(StringIO(c)) # wrap c into a StringIO.
for line in reader:
    print line[0]

上次编辑:

import csv

with open("links.csv") as f:
    r = csv.reader(f)
    start_urls = [l[0] for l in r]

相关问题 更多 >