有没有什么办法可以让这段代码更快？问题的回答

有没有什么办法可以让这段代码更快？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我注意到了几点： <ul> <li>每次调用<code>countTokens</code>时，都会再次生成相同的翻译表（即<code>maketrans</code>调用）。我猜这不会被优化掉，所以你可能会失去那里的性能。你知道吗</li> <li><code>tokens.split(" ")</code>创建字符串中所有单词的列表，这相当昂贵，例如当字符串为100000个单词时。你不需要这个。你知道吗</li> <li>总的来说，看起来您只是简单地计算一个单词在字符串中包含的频率。使用<a href="https://docs.python.org/3/library/stdtypes.html#str.count" rel="nofollow noreferrer">^{<cd4>}</a>，您可以用大量的减少开销。你知道吗</li> </ul> 如果你应用了它，你就不再需要<code>countTokens</code>函数了，再进行一点重构，结果就是： <pre><code>def normalOrder(recipes, queries): for recipe in recipes: recipe["score"] = recipe.get("rating", 0) for query in queries: recipe["score"] += ( 8 * recipe["title"].lower().count(query) + 4 * recipe["categories"].lower().count(query) + 2 * recipe["ingredients"].lower().count(query) + 1 * recipe["directions"].lower().count(query) ) return recipes </code></pre> 这对你有用吗？够快吗？你知道吗 编辑：在原始代码中，您将对<code>recipe["title"]</code>的访问和其他字符串包装在另一个<code>str()</code>调用中。我猜它们已经是绳子了？如果不是，你需要在这里加上。你知道吗 <hr/> 你在评论中说标点符号是个问题。正如我在评论中所说的，我认为您不必担心这一点，因为<code>count</code>调用将只关心标点字符如果查询词和配方文本都包含标点字符，那么count调用将只统计周围标点与所查询内容相匹配的出现次数。看看这些例子： <pre><code>>>> "Some text, that...".count("text") 1 >>> "Some text, that...".count("text.") 0 >>> "Some text, that...".count("text,") 1 </code></pre> 如果你想让它有不同的表现，你可以像你在原始问题中所做的那样：创建一个翻译表并应用它。请记住，将此翻译应用于配方文本（正如您在问题中所做的那样）没有多大意义，因为从那时起，任何包含标点符号的查询词都不匹配。只需忽略所有包含标点符号的查询词，就可以轻松地完成这项工作。您可能希望对查询词进行翻译，这样，如果有人输入“potato”，您就可以找到所有出现的“potato”。这看起来像这样： <pre><code>def normalOrder(recipes, queries): translation_table = str.maketrans(digits + punctuation, " " * len(digits + punctuation)) for recipe in recipes: recipe["score"] = recipe.get("rating", 0) for query in queries: replaced_query = query.translate(translation_table) recipe["score"] += ( 8 * recipe["title"].lower().count(replaced_query) + 4 * recipe["categories"].lower().count(replaced_query) + 2 * recipe["ingredients"].lower().count(replaced_query) + 1 * recipe["directions"].lower().count(replaced_query) ) return recipes </code></pre> <hr/> Edit3：在评论中，你说你想搜索[“蜂蜜”，“柠檬”]来匹配“蜂蜜柠檬”，但是你不想“黄油”来匹配“黄油手指”。为此，您的初始方法可能是最好的解决方案，但请记住，搜索单数形式“potato”将不再匹配复数形式（“potato”）或任何其他派生形式。你知道吗 <pre><code>def normalOrder(recipes, queries): transtab = str.maketrans(digits + punctuation, " " * len(digits + punctuation)) for recipe in recipes: recipe["score"] = recipe.get("rating", 0) title_words = recipe["title"].lower().translate(transtab).split() category_words = recipe["categories"].lower().translate(transtab).split() ingredient_words = recipe["ingredients"].lower().translate(transtab).split() direction_words = recipe["directions"].lower().translate(transtab).split() for query in queries: recipe["score"] += ( 8 * title_words.count(query) + 4 * category_words.count(query) + 2 * ingredient_words.count(query) + 1 * direction_words.count(query) ) return recipes </code></pre> 如果使用相同的recipes更频繁地调用此函数，则可以通过在recipes中存储<code>.lower().translate().split()</code>的结果来获得更高的性能，这样就不需要在每次调用中重新创建该列表。你知道吗 根据您的输入数据（平均有多少个查询？），只需遍历<code>split()</code>结果一次，然后将每个单词的计数相加，也可能是有意义的。这将使查找单个单词的速度大大加快，也可以在函数调用之间保留，但构建成本更高： <pre><code>from collections import Counter transtab = str.maketrans(digits + punctuation, " " * len(digits + punctuation)) def counterFromString(string): words = string.lower().translate(transtab).split() return Counter(words) def normalOrder(recipes, queries): for recipe in recipes: recipe["score"] = recipe.get("rating", 0) title_counter = counterFromString(recipe["title"]) category_counter = counterFromString(recipe["categories"]) ingredient_counter = counterFromString(recipe["ingredients"]) direction_counter = counterFromString(recipe["directions"]) for query in queries: recipe["score"] += ( 8 * title_counter[query] + 4 * category_counter[query] + 2 * ingredient_counter[query] + 1 * direction_counter[query] ) return recipes </code></pre> Edit4:我已经用一个计数器替换了defaultdict，因为我不知道类的存在。你知道吗

有没有什么办法可以让这段代码更快？

1 个回答

相关Python问题