<p>您可以对所有行应用过滤lambda函数,该函数接受每个字符并仅保留数字:</p>
<pre class="lang-py prettyprint-override"><code>lists_combined['CustomerVoicePhone'] = (lists_combined.CustomerVoicePhone
.map(lambda x: ''.join(filter(str.isdigit, x))))
</code></pre>
<p>在性能方面,我们可以将其与以下代码中的其他答案进行比较,并发现对于大数据帧(100k电话号码),它的速度要快一些:</p>
<pre class="lang-py prettyprint-override"><code>def gen_phone():
first = str(random.randint(100,999))
second = str(random.randint(1,888)).zfill(3)
last = (str(random.randint(1,9998)).zfill(4))
while last in ['1111','2222','3333','4444','5555','6666','7777','8888']:
last = (str(random.randint(1,9998)).zfill(4))
return '{}-{}-{}'.format(first,second, last)
df = pd.DataFrame(columns=['p'])
for _ in range(100000):
p = gen_phone()
df = df.append({'p':p}, ignore_index=True)
def method1():
regex = '\)|\(|-|\+|\s' #or regex = '[\(\)\+\-\s]' using character class
df['p_1'] = (df['p'].str.replace(regex,'')
.fillna(df['p']))
%time method1()
# Wall time: 166 ms
def method2():
df['p_2'] = (df.p.map(lambda x: ''.join(filter(str.isdigit, x))))
%time method2()
# Wall time: 151 ms
</code></pre>