Scrapy - 网站爬取及将数据存储于Microsoft SQL Server数据库的方法？ - 问答

import scrapy class QuotesSpider(scrapy.Spider): name = "quotes" start_urls = [ 'http://quotes.toscrape.com/page/1/', 'http://quotes.toscrape.com/page/2/', ] def parse(self, response): for quote in response.css('div.quote'): yield { 'text': quote.css('span.text::text').extract_first(), 'author': quote.css('small.author::text').extract_first(), 'tags': quote.css('div.tags a.tag::text').extract(), }

2条回答

网友

1楼 · 编辑于 2024-05-18 05:12:45

您可以使用pymssql模块将数据发送到SQL Server，如下所示：

import pymssql

class DataPipeline(object):
    def __init__(self):
        self.conn = pymssql.connect(host='host', user='user', password='passwd', database='db')
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        try:
            self.cursor.execute("INSERT INTO MYTABLE(text, author, tags) VALUES (%s, %s, %s)", (item['text'], item['author'], item['tags']))
            self.conn.commit()
        except pymssql.Error, e:
            print ("error")

        return item

另外，您还需要将'spider_name.pipelines.DataPipeline' : 300添加到ITEM_PIPELINESdict-in设置中。在

网友

2楼 · 编辑于 2024-05-18 05:12:45

我认为最好的做法是将数据保存到CSV，然后将CSV加载到sqlserver表中。在

import csv
import requests
import bs4

res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)

# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]

# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]

# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]

# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]

# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
    w = csv.writer(csvfile)
    for e in l:
        w.writerow(e)

或者

^{pr2}$

Scrapy - 网站爬取及将数据存储于Microsoft SQL Server数据库的方法？

相关问题更多 >

编程相关推荐

热门问题

热门文章

Scrapy - 网站爬取及将数据存储于Microsoft SQL Server数据库的方法？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >