我在用Haystack和Whoosh为产品列表(约280k)编制索引时遇到问题。运行索引更新似乎需要28个多小时。我认为那根本不是一个合理的时间
我有一个模型:
class SupplierSkus(models.Model):
sku = models.CharField(max_length=20)
link = models.CharField(max_length=4096)
price = models.FloatField()
last_updated = models.DateTimeField("Date Updated", null=True, auto_now=True)
status = models.ForeignKey(Status, on_delete=models.PROTECT, default=1)
category = models.CharField(max_length=1024)
family = models.CharField(max_length=20)
family_desc = models.TextField(null=True)
family_name = models.CharField(max_length=250)
product_name = models.CharField(max_length=250)
was_price = models.FloatField(null=True)
vat_rate = models.FloatField(null=True)
lead_from = models.IntegerField(null=True)
lead_to = models.IntegerField(null=True)
deliv_cost = models.FloatField(null=True)
prod_desc = models.TextField(null=True)
attributes = models.TextField(null=True)
brand = models.TextField(null=True)
mpn = models.CharField(max_length=50, null=True)
ean = models.CharField(max_length=15, null=True)
supplier = models.ForeignKey(Suppliers, on_delete=models.PROTECT)
我得到了一个search_index.py:
from haystack import indexes
from products.models import SupplierSkus
class ProductIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
sku = indexes.CharField(model_attr='sku')
category = indexes.CharField(model_attr='category')
product_name = indexes.CharField(model_attr='product_name')
family_name = indexes.CharField(model_attr='family_name')
prod_desc = indexes.CharField(model_attr='prod_desc')
family_desc = indexes.CharField(model_attr='family_desc')
brand = indexes.CharField(model_attr='brand')
mpn = indexes.CharField(model_attr='mpn')
ean = indexes.CharField(model_attr='ean')
attributes = indexes.CharField(model_attr='attributes')
def get_model(self):
return SupplierSkus
def index_queryset(self, using=None):
return SupplierSkus.objects.filter(status_id=1)
我注意到2之后的Django版本在迭代大型查询集时性能大幅下降。我不确定这是为什么,但我现在通常必须在处理大型数据集时使用.iterator()
函数。或分页。或者直接使用SQL—这似乎是处理大型数据集的最快方法
但我不能把一个list
传递给Haystack:
class must return a 'QuerySet' in the 'index_queryset' method.
鉴于我需要发送QuerySet
,我如何在合理的时间内完成这项工作
目前没有回答
相关问题 更多 >
编程相关推荐