Python re模块的缓存清除

网友

1楼 · 编辑于 2024-10-01 11:28:20

如果非要我猜的话，我会说这样做是为了避免跟踪单个值在缓存中存储的时间/时间，这会造成内存和处理开销。因为所使用的缓存对象是字典，它本质上是无序的，所以在没有其他缓存对象的情况下，没有很好的方法知道向其中添加了哪些顺序项。如果使用Python>；=2.7，那么可以通过使用OrderedDict代替标准字典来解决这个问题，但如果不是这样，则需要重新设计缓存的实现方式，以消除对clear()的需要。在

网友

2楼 · 编辑于 2024-10-01 11:28:20

缓存的目的是减少函数的平均调用时间。在_cache中保留更多信息并对其进行修剪而不是清除的开销会增加平均调用时间。_cache.clear()调用将很快完成，即使丢失缓存，这也比保持缓存状态和在达到限制时从缓存中删除单个元素的开销要好。在

计算缓存效率时需要考虑以下几点：

缓存命中的平均调用时间（非常短）
缓存未命中时的平均调用时间（更长）
缓存命中的频率（相当罕见）
清除或修剪缓存时的调用时间（相当罕见）

问题是，如果增加3意味着同时增加2和4，那么增加3有意义吗。我的猜测是没有，或者差别可以忽略不计，所以保持代码简单是更好的选择。在

网友

3楼 · 编辑于 2024-10-01 11:28:20

这是一个新的regex模块的开发人员在3.3中关于缓存的一段话，这是将新模块与当前的re模块分开的特性列表的一部分。在

7) Modify the re compiled expression cache to better handle the thrashing condition. Currently, when regular expressions are compiled, the result is cached so that if the same expression is compiled again, it is retrieved from the cache and no extra work has to be done. This cache supports up to 100 entries. Once the 100th entry is reached, the cache is cleared and a new compile must occur. The danger, all be it rare, is that one may compile the 100th expression only to find that one recompiles it and has to do the same work all over again when it may have been done 3 expressions ago. By modifying this logic slightly, it is possible to establish an arbitrary counter that gives a time stamp to each compiled entry and instead of clearing the entire cache when it reaches capacity, only eliminate the oldest half of the cache, keeping the half that is more recent. This should limit the possibility of thrashing to cases where a very large number of Regular Expressions are continually recompiled. In addition to this, I will update the limit to 256 entries, meaning that the 128 most recent are kept.

http://bugs.python.org/issue2636

这似乎表明，解释当前缓存行为的更可能是开发人员的懒惰或“对可读性的强调”。在

相关问题更多 >

编程相关推荐

热门问题

热门文章