有没有什么办法可以让这段代码更快？

2条回答

网友

1楼 · 编辑于 2024-09-28 22:23:29

首先可以使用get而不是if条件。你知道吗


def countTokens(token): 
     if token is None:
         return []
    token = str(token).lower() #make digits an punctuations white spaces
    tokens = token.translate(token.maketrans(digits + punctuation,\ " "*len(digits + punctuation)))
    return tokens.split(" ")

def normalOrder(recipes, queries): 
    for r in recipes: 
        parts, scores = [[],[],[],[]], 0 
        parts[0] = countTokens(r["title"]) 
        parts[1] = countTokens(r.get("categories", None )) 
        parts[2] = countTokens(r.get("ingredients", None)) 
        parts[3] = countTokens(r.get("directions", None)) 
     for q in queries: 
           scores += 8 * parts[0].count(q) + 4 * parts[1].count(q) + 2 * parts[2].count(q) + 1 * parts[3].count(q) 
      r["score"] = scores + r.get("rating", 0)
    return recipes

网友

2楼 · 编辑于 2024-09-28 22:23:29

我注意到了几点：

每次调用countTokens时，都会再次生成相同的翻译表（即maketrans调用）。我猜这不会被优化掉，所以你可能会失去那里的性能。你知道吗
tokens.split(" ")创建字符串中所有单词的列表，这相当昂贵，例如当字符串为100000个单词时。你不需要这个。你知道吗
总的来说，看起来您只是简单地计算一个单词在字符串中包含的频率。使用^{}，您可以用大量的减少开销。你知道吗

如果你应用了它，你就不再需要countTokens函数了，再进行一点重构，结果就是：

def normalOrder(recipes, queries):
    for recipe in recipes:
        recipe["score"] = recipe.get("rating", 0)

        for query in queries:
            recipe["score"] += (
                8 * recipe["title"].lower().count(query)
                + 4 * recipe["categories"].lower().count(query)
                + 2 * recipe["ingredients"].lower().count(query)
                + 1 * recipe["directions"].lower().count(query)
            )

    return recipes

这对你有用吗？够快吗？你知道吗

编辑：在原始代码中，您将对recipe["title"]的访问和其他字符串包装在另一个str()调用中。我猜它们已经是绳子了？如果不是，你需要在这里加上。你知道吗

你在评论中说标点符号是个问题。正如我在评论中所说的，我认为您不必担心这一点，因为count调用将只关心标点字符如果查询词和配方文本都包含标点字符，那么count调用将只统计周围标点与所查询内容相匹配的出现次数。看看这些例子：

>>> "Some text, that...".count("text")
1
>>> "Some text, that...".count("text.")
0
>>> "Some text, that...".count("text,")
1

如果你想让它有不同的表现，你可以像你在原始问题中所做的那样：创建一个翻译表并应用它。请记住，将此翻译应用于配方文本（正如您在问题中所做的那样）没有多大意义，因为从那时起，任何包含标点符号的查询词都不匹配。只需忽略所有包含标点符号的查询词，就可以轻松地完成这项工作。您可能希望对查询词进行翻译，这样，如果有人输入“potato”，您就可以找到所有出现的“potato”。这看起来像这样：

def normalOrder(recipes, queries):
    translation_table = str.maketrans(digits + punctuation, " " * len(digits + punctuation))
    for recipe in recipes:
        recipe["score"] = recipe.get("rating", 0)

        for query in queries:
            replaced_query = query.translate(translation_table)
            recipe["score"] += (
                8 * recipe["title"].lower().count(replaced_query)
                + 4 * recipe["categories"].lower().count(replaced_query)
                + 2 * recipe["ingredients"].lower().count(replaced_query)
                + 1 * recipe["directions"].lower().count(replaced_query)
            )

    return recipes

Edit3：在评论中，你说你想搜索[“蜂蜜”，“柠檬”]来匹配“蜂蜜柠檬”，但是你不想“黄油”来匹配“黄油手指”。为此，您的初始方法可能是最好的解决方案，但请记住，搜索单数形式“potato”将不再匹配复数形式（“potato”）或任何其他派生形式。你知道吗

def normalOrder(recipes, queries):
    transtab = str.maketrans(digits + punctuation, " " * len(digits + punctuation))
    for recipe in recipes:
        recipe["score"] = recipe.get("rating", 0)

        title_words = recipe["title"].lower().translate(transtab).split()
        category_words = recipe["categories"].lower().translate(transtab).split()
        ingredient_words = recipe["ingredients"].lower().translate(transtab).split()
        direction_words = recipe["directions"].lower().translate(transtab).split()

        for query in queries:
            recipe["score"] += (
                8 * title_words.count(query)
                + 4 * category_words.count(query)
                + 2 * ingredient_words.count(query)
                + 1 * direction_words.count(query)
            )

    return recipes

如果使用相同的recipes更频繁地调用此函数，则可以通过在recipes中存储.lower().translate().split()的结果来获得更高的性能，这样就不需要在每次调用中重新创建该列表。你知道吗

根据您的输入数据（平均有多少个查询？），只需遍历split()结果一次，然后将每个单词的计数相加，也可能是有意义的。这将使查找单个单词的速度大大加快，也可以在函数调用之间保留，但构建成本更高：

from collections import Counter

transtab = str.maketrans(digits + punctuation, " " * len(digits + punctuation))

def counterFromString(string):
    words = string.lower().translate(transtab).split()
    return Counter(words)

def normalOrder(recipes, queries):
    for recipe in recipes:
        recipe["score"] = recipe.get("rating", 0)

        title_counter = counterFromString(recipe["title"])
        category_counter = counterFromString(recipe["categories"])
        ingredient_counter = counterFromString(recipe["ingredients"])
        direction_counter = counterFromString(recipe["directions"])

        for query in queries:
            recipe["score"] += (
                8 * title_counter[query]
                + 4 * category_counter[query]
                + 2 * ingredient_counter[query]
                + 1 * direction_counter[query]
            )

    return recipes

Edit4:我已经用一个计数器替换了defaultdict，因为我不知道类的存在。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章