概率PAR的内存使用问题的回答

概率PAR的内存使用

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在为范围连接语法编写一个CKY解析器。我想用树库作为语法，这样语法会很大。我用Python编写了一个原型<a href="http://www.nltk.org/download" rel="nofollow">1</a>，当我模拟一个由几十个句子组成的树库时，它似乎可以很好地工作，但是内存的使用是不可接受的。我试着用C++编写，但到目前为止，我一直没有用C++，这是非常令人沮丧的。以下是一些数据（n是语法基于的句子数）： <pre><code>n mem 9 173M 18 486M 36 836M </code></pre> 这种增长模式是最好的优先算法所期望的，但是开销的大小是我关心的。根据heapy的数据，内存使用量比这些数字小10倍，valgrind报告了类似的情况。是什么导致了这种差异？我能用Python（或Cython）对此做些什么吗？也许是因为碎片化？或者是python字典的开销？在 背景知识：两种重要的数据结构是将边缘映射到概率的议程和将非终结点和位置映射到边缘的字典图表。这个议程是用heapdict（内部使用dict和heapq列表）实现的，这个图表有一个字典，将非终端和位置映射到边缘。议程经常被插入和删除，图表只得到插入和查找。我用这样的元组表示边： ^{pr2}$ 字符串是语法中的非终结符标签，位置被编码为位掩码。当一个成分不连续时，可能有多个位置。所以这个边可以代表对“is Mary happy”的分析，其中“is”和“happy”都属于VP。在本例中，图表字典按此边的第一个元素（“S”，111）编制索引。在一个新版本中，我尝试将此表示转换为可重用的内存： <pre><code>(("S", "NP", "VP), (111, 100, 011)) </code></pre> 我认为，如果第一部分与不同的位置结合使用，Python只会存储一次，尽管我不确定这是否正确。不管是哪种情况，似乎都没什么区别。在 基本上我想知道的是，是否值得进一步追问我的Python实现，包括用Cython和不同的数据结构来做事情，或者从C++中直接编写它是唯一可行的选择。在 更新：经过一些改进，我不再有内存使用问题。我正在研究一个优化的Cython版本。对于提高代码效率的最有用的建议，我将给予奖励。在<a href="http://student.science.uva.nl/~acranenb/plcfrs_cython.html" rel="nofollow">http://student.science.uva.nl/~acranenb/plcfrs_cython.html</a>处有一个带注释的版本 <a href="http://www.nltk.org/download" rel="nofollow">1</a><a href="https://github.com/andreasvc/disco-dop/" rel="nofollow">https://github.com/andreasvc/disco-dop/</a> --跑测试.py分析一些句子。需要Python2.6、<a href="http://www.nltk.org/download" rel="nofollow">nltk</a>和<a href="http://pypi.python.org/pypi/HeapDict" rel="nofollow">heapdict</a>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在这些情况下，首先要做的是概述： <pre><code>15147/297 0.032 0.000 0.041 0.000 tree.py:102(__eq__) 15400/200 0.031 0.000 0.106 0.001 tree.py:399(convert) 1 0.023 0.023 0.129 0.129 plcfrs_cython.pyx:52(parse) 6701/1143 0.022 0.000 0.043 0.000 heapdict.py:45(_min_heapify) 18212 0.017 0.000 0.023 0.000 plcfrs_cython.pyx:38(__richcmp__) 10975/10875 0.017 0.000 0.035 0.000 tree.py:75(__init__) 5772 0.016 0.000 0.050 0.000 tree.py:665(__init__) 960 0.016 0.000 0.025 0.000 plcfrs_cython.pyx:118(deduced_from) 46938 0.014 0.000 0.014 0.000 tree.py:708(_get_node) 25220/2190 0.014 0.000 0.016 0.000 tree.py:231(subtrees) 10975 0.013 0.000 0.023 0.000 tree.py:60(__new__) 49441 0.013 0.000 0.013 0.000 {isinstance} 16748 0.008 0.000 0.015 0.000 {hasattr} </code></pre> 我注意到的第一件事是很少有函数来自cython模块本身。他们大多来自树.py模块，也许这就是瓶颈。在 集中在cython方面，我看到了richcmp函数： 我们可以通过在方法声明中添加值的类型来优化它 ^{pr2}$ 这就降低了价值 <pre><code>ncalls tottime percall cumtime percall filename:lineno(function) .... 18212 0.011 0.000 0.015 0.000 plcfrs_cython.pyx:38(__richcmp__) </code></pre> 添加elif语法而不是单个if将启用<a href="http://wiki.cython.org/enhancements/switch" rel="nofollow">the switch optimization of cython</a> <pre><code> if op == 0: return self.label < other.label or self.vec < other.vec elif op == 1: return self.label <= other.label or self.vec <= other.vec elif op == 2: return self.label == other.label and self.vec == other.vec elif op == 3: return self.label != other.label or self.vec != other.vec elif op == 4: return self.label > other.label or self.vec > other.vec elif op == 5: return self.label >= other.label or self.vec >= other.vec </code></pre> 获得： <pre><code>17963 0.002 0.000 0.002 0.000 plcfrs_cython.pyx:38(__richcmp__) </code></pre> 想弄清楚在哪里树。py:399转换来自我发现这个函数里面兴奋剂花了那么多时间 <pre><code> def removeids(tree): """ remove unique IDs introduced by the Goodman reduction """ result = Tree.convert(tree) for a in result.subtrees(lambda t: '@' in t.node): a.node = a.node.rsplit('@', 1)[0] if isinstance(tree, ImmutableTree): return result.freeze() return result </code></pre> 现在我不确定树中的每个节点是否都是ChartItem，以及getitem值正在其他地方使用，但添加了以下更改： <pre><code>cdef class ChartItem: cdef public str label cdef public str root cdef public long vec cdef int _hash __slots__ = ("label", "vec", "_hash") def __init__(ChartItem self, label, int vec): self.label = intern(label) #.rsplit('@', 1)[0]) self.root = intern(label.rsplit('@', 1)[0]) self.vec = vec self._hash = hash((self.label, self.vec)) def __hash__(self): return self._hash def __richcmp__(ChartItem self, ChartItem other, int op): if op == 0: return self.label < other.label or self.vec < other.vec elif op == 1: return self.label <= other.label or self.vec <= other.vec elif op == 2: return self.label == other.label and self.vec == other.vec elif op == 3: return self.label != other.label or self.vec != other.vec elif op == 4: return self.label > other.label or self.vec > other.vec elif op == 5: return self.label >= other.label or self.vec >= other.vec def __getitem__(ChartItem self, int n): if n == 0: return self.root elif n == 1: return self.vec def __repr__(self): #would need bitlen for proper padding return "%s[%s]" % (self.label, bin(self.vec)[2:][::-1]) </code></pre> 以及最具潜力的Parse内部： <pre><code>from libc cimport pow def mostprobableparse... ... cdef dict parsetrees = <dict>defaultdict(float) cdef float prob m = 0 for n,(a,prob) in enumerate(derivations): parsetrees[a] += pow(e,prob) m += 1 </code></pre> 我得到： <pre><code> 189345 function calls (173785 primitive calls) in 0.162 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 6701/1143 0.025 0.000 0.037 0.000 heapdict.py:45(_min_heapify) 1 0.023 0.023 0.120 0.120 plcfrs_cython.pyx:54(parse) 960 0.018 0.000 0.030 0.000 plcfrs_cython.pyx:122(deduced_from) 5190/198 0.011 0.000 0.015 0.000 tree.py:102(__eq__) 6619 0.006 0.000 0.006 0.000 heapdict.py:67(_swap) 9678 0.006 0.000 0.008 0.000 plcfrs_cython.pyx:137(concat) </code></pre> 因此，下一步是优化heapify并从 推导公式可以进一步优化： <pre><code>cdef inline deduced_from(ChartItem Ih, double x, pyCx, pyunary, pylbinary, pyrbinary, int bitlen): cdef str I = Ih.label cdef int Ir = Ih.vec cdef list result = [] cdef dict Cx = <dict>pyCx cdef dict unary = <dict>pyunary cdef dict lbinary = <dict>pylbinary cdef dict rbinary = <dict>pyrbinary cdef ChartItem Ilh cdef double z cdef double y cdef ChartItem I1h for rule, z in unary[I]: result.append((ChartItem(rule[0][0], Ir), ((x+z,z), (Ih,)))) for rule, z in lbinary[I]: for I1h, y in Cx[rule[0][2]].items(): if concat(rule[1], Ir, I1h.vec, bitlen): result.append((ChartItem(rule[0][0], Ir ^ I1h.vec), ((x+y+z, z), (Ih, I1h)))) for rule, z in rbinary[I]: for I1h, y in Cx[rule[0][1]].items(): if concat(rule[1], I1h.vec, Ir, bitlen): result.append((ChartItem(rule[0][0], I1h.vec ^ Ir), ((x+y+z, z), (I1h, Ih)))) return result </code></pre> 我将到此为止，尽管我有信心随着对问题的深入了解，我们可以继续优化。在 一系列的unittest将有助于断言每个优化不会引入任何细微的错误。在 注意，尽量用空格代替制表符。在

概率PAR的内存使用

1 个回答

相关Python问题