如何简化这些词典的理解？问题的回答

如何简化这些词典的理解？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

理解是python的一个惊人特性，但它们并不总是最好的例子。与其创建大量变量并将它们混合在一起，不如一次处理一个变量。我确信可以从这段代码中榨出更多的汁，但这应该在可读性和处理能力之间提供一个很好的平衡。我检查了输出，以确保它与代码的输出相匹配 <pre><code>dicTfAll = { 1: {'c1': ['aa', 'bb', 'cc']}, 2: {'c1': ['dd', 'ee', 'ff']} } dicTf = { 1: {'c2': ['aax', 'bbx', 'cc']}, 2: {'c2': ['ddy', 'eey', 'ff']}, 3: {'c2': ['xx', '11']} } outputCompanies = {} for d in [dicTfAll, dicTf]: for idx, records in d.items(): for company, items in records.items(): if company not in outputCompanies.keys(): outputCompanies[company] = {} for item in items: outputCompanies[company][item] = idx print(outputCompanies) # { # 'c2': {'11': 3, 'ddy': 2, 'eey': 2, 'cc': 1, 'xx': 3, 'ff': 2, 'bbx': 1, 'aax': 1}, # 'c1': {'aa': 1, 'bb': 1, 'cc': 1, 'dd': 2, 'ee': 2, 'ff': 2} # } </code></pre> 由于您正在寻找性能更高的代码，下面是在jupyter实验室中使用<code>%%timeit</code>对运行时的比较 <pre><code># My version 2.99 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # Original Version 6.39 µs ± 25.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) </code></pre> 我还尝试了一个稍微紧凑一点的代码版本，但最终运行时间更长 <pre><code>%%timeit outputCompanies = defaultdict(dict) for d in [dicTfAll, dicTf]: for idx, records in d.items(): for company, items in records.items(): outputCompanies[company].update({item: idx for item in items}) # 4.88 µs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) </code></pre> 另一项测试实际上包括理解： <pre><code>%%timeit outputCompanies = {} for d in [dicTfAll, dicTf]: for idx, records in d.items(): for company, items in records.items(): if company not in outputCompanies.keys(): outputCompanies[company] = {} outputCompanies[company].update({ item: idx for item in items }) # 4.99 µs ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) </code></pre> 对于代码中的一些注释，<code>dict.keys()</code>返回一个<code>list</code>对象，因此不需要调用<code>list(dict.keys())</code>。也不需要创建变量<code>allKeys</code>，因为您可以在字典理解中调用<code>dict.keys()</code>。公司都是硬编码的，如果这是一个一次性的脚本，这是很好的，但如果您希望数据集随着时间的推移而扩展，这不是最好的。但是如果您想硬编码它们，可以跳过变量声明，只需键入<code>for company in ['c1','c2']:</code>。接下来，您可以通过创建等于第一个理解的<code>dicTfAllP</code>并使用第二个理解更新它来保存更多变量。把这些放在一起，你会得到下面的代码。它更具可读性，也更容易理解，但性能并没有提高多少 <pre><code>%%timeit dicTfAllP = { item[0]:item[1] for item in dicTf.items() if item[0] not in dicTfAll.keys() } dicTfAllP.update({ item[0]: dict(dicTfAll[item[0]], **item[1]) for item in dicTf.items() if item[0] in dicTfAll.keys() }) outputCompanies = {} for company in ['c1','c2']: theKeys = [key for key in dicTfAllP.keys() if company in dicTfAllP[key]] outputCompanies[company] = { token:key for key in theKeys for token in dicTfAllP[key][company] } # 6.11 µs ± 58.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) </code></pre>

如何简化这些词典的理解？

1 个回答

相关Python问题