Python/Django自动将刮取的数据保存到我的数据库中

from django.template.loader import get_template from django.shortcuts import render_to_response from bs4 import BeautifulSoup import urllib2, sys import urlparse import re from listing.models import jobLinks #this function extract the links def businessghana(): site = "http://www.businessghana.com/portal/jobs" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) soup = BeautifulSoup(jobpass) for tag in soup.find_all('a', href = True): tag['href'] = urlparse.urljoin('http://www.businessghana.com/portal/', tag['href']) return map(str, soup.find_all('a', href = re.compile('.getJobInfo'))) # result from businssghana() saved to a variable to make them iterable as a list all_links = businessghana() #this function should be saving the links to the database unless the link already exist def save_new_links(all_links): current_links = jobLinks.objects.all() for i in all_links: if i not in current_links: jobLinks.objects.create(url=i) # I called the above function here hoping that it will save to database save_new_links(all_links) # return my httpResponse with this function def display_links(request): name = all_links() return render_to_response('jobs.html', {'name' : name})

1条回答

网友

1楼 · 发布于 2024-09-30 05:32:24

你的代码有一些问题。首先，jobLinks模型没有url字段。您应该在create()语句中使用links=i。其次，检查字符串url是否在QuerySet中。永远不会这样，因为QuerySet的元素是模型类的对象，在您的例子中是jobLinks对象。相反，你可以这样做：

def save_new_links(all_links):
    new_links = []
    for i in all_links:
        if not jobLinks.objects.filter(links=i):
            new_links.append(jobLinks(links=i))
    jobLinks.objects.bulk_create(new_links)

我还建议在将来，你可以用单数词来描述你的模型。例如，在本例中，我将调用模型jobLink和字段{}。在

相关问题更多 >

编程相关推荐

热门问题

热门文章