<p>这是工作代码</p>
<pre><code> import scrapy
from scrapy.spider import Spider
from scrapy.http import Request
class CollegesItem(scrapy.Item):
# define the fields for your item here like:
name = scrapy.Field()
location = scrapy.Field()
class CollegesSpider(Spider):
name = 'colleges'
allowed_domains = ["4icu.org"]
start_urls = ('http://www.4icu.org/in/',)
def parse(self, response):
for tr in response.xpath('//div[@class="section group"] [5]/div[@class="col span_2_of_2"][1]/table//tr'):
if tr.xpath(".//td[@class='i']"):
item = CollegesItem()
item['name'] = tr.xpath('./td[1]/a/text()').extract()[0]
item['location'] = tr.xpath('./td[2]//text()').extract()[0]
yield item
</code></pre>
<p>运行命令后
蜘蛛</p>
^{pr2}$
<p>以下是结果片段:</p>
<pre><code> [[[[[[[{"name": "Indian Institute of Technology Bombay", "location": "Mumbai"},
{"name": "Indian Institute of Technology Madras", "location": "Chennai"},
{"name": "University of Delhi", "location": "Delhi"},
{"name": "Indian Institute of Technology Kanpur", "location": "Kanpur"},
{"name": "Anna University", "location": "Chennai"},
{"name": "Indian Institute of Technology Delhi", "location": "New Delhi"},
{"name": "Manipal University", "location": "Manipal ..."},
{"name": "Indian Institute of Technology Kharagpur", "location": "Kharagpur"},
{"name": "Indian Institute of Science", "location": "Bangalore"},
{"name": "Panjab University", "location": "Chandigarh"},
{"name": "National Institute of Technology, Tiruchirappalli", "location": "Tiruchirappalli"}, .........
</code></pre>