这是我的测试项目树:
├── test11
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ ├── basic.py
│ ├── easy.py
├── scrapy.cfg
在items.py
文件中,我有:
从下脚料导入项,字段
^{pr2}$在easy.py
文件中,我有:
在basic.py
文件中,我有:
import scrapy
import urlparse
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose, Join
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from test11.items import Test11Item
class BasicSpider(scrapy.Spider):
name = 'basic'
allowed_domains = ['web']
start_urls = ['https://www.amazon.cn/b?ie=UTF8&node=2127529051']
def parse(self, response):
l = ItemLoader(item = Test11Item(), response = response)
l.add_xpath('name', '//*[@id="productTitle"]/text()', MapCompose(unicode.strip))
l.add_xpath('//*[@id="priceblock_ourprice"]/text()', MapCompose(lambda i: i.replace(',', ''), float), re='[,.0-9]+')
return l.load_item()
当我运行basic
蜘蛛(scrapy crawl basic
)时,我得到了我想要的结果。但是当我运行easy
spider,scrapy crawl easy
时,我根本没有得到任何结果!在
我错过了什么?在
您只需适当设置
allowed_domains
:相关问题 更多 >
编程相关推荐