回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一套相当简单的要求。我有一个对象列表(长度200万),每个对象都有两个需要regexed的属性(其他属性不变)</p>
<p>零一二的值。。。需要将10更改为它们的数值:12。。。10个</p>
<p>示例:</p>
<pre><code>ONE MAIN STREET -> 1 MAIN STREET
BONE ROAD -> BONE ROAD
BUILDING TWO, THREE MAIN ROAD -> BUILDING 2, 3 MAIN ROAD
ELEVEN MAIN ST -> ELEVEN MAIN STREET
ONE HUNDRED FUNTOWN -> 1 HUNDRED FUNTOWN
</code></pre>
<p>很明显,有些数字是不变的,有些数字收费很奇怪。<strong>这完全是意料之中的</p>
<p>我可以把它都用在我下面的东西上。我的问题是,有没有一个聪明的方法让这一切运行得更快?我曾想过制作一个<code>list</code>的<code>dictionaries</code>,其中键是单词数字,值是数字,但我认为这对性能没有帮助。或者<code>re.compile</code>每个regex并将它们传递给这个函数?有什么好主意能让它跑得更快吗?你知道吗</p>
<pre><code>def update_word_to_numeric(entrylist):
updated_entrylist = []
for theentry in entrylist:
theentry.addr_ln_1 = re.sub(r"\bZERO\b", "0", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bONE\b", "1", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bTWO\b", "2", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bTHREE\b", "3", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bFOUR\b", "4", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bFIVE\b", "5", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bSIX\b", "6", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bSEVEN\b", "7", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bEIGHT\b", "8", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bNINE\b", "9", theentry.addr_ln_1)
theentry.addr_ln_1 = re.sub(r"\bTEN\b", "10", theentry.addr_ln_1)
theentry.addr_ln_2 = re.sub(r"\bZERO\b", "0", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bONE\b", "1", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bTWO\b", "2", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bTHREE\b", "3", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bFOUR\b", "4", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bFIVE\b", "5", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bSIX\b", "6", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bSEVEN\b", "7", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bEIGHT\b", "8", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bNINE\b", "9", theentry.addr_ln_2)
theentry.addr_ln_2 = re.sub(r"\bTEN\b", "10", theentry.addr_ln_2)
updated_entrylist.append(theentry)
return updated_entrylist
</code></pre>
<p>也许这只是一个很好的方法。“够好了”的评论对我也很好:)</p>