替换功能在数据帧中不工作

filename = 'file.xlsx' wb = xw.Book(filename) sheet1 = wb.sheets['sheet1'] df1 = sheet1.used_range.options(pd.DataFrame, index=False, header=True).value sheet2 = wb.sheets['sheet2'] df2 = sheet2.used_range.options(pd.DataFrame, index=False, header=True).value wb.close() lists_combined = pd.concat([df1, df2]) lists_combined['filename'] = filename lists_combined['CustomerVoicePhone'] = lists_combined['CustomerVoicePhone'].replace('-','').replace('(','').replace(')','').replace('+','').replace(' ','') lists_combined = lists_combined.filter(items=['filename','CustomerEmail', 'CustomerVoicePhone','CustomerTextPhone'])

3条回答

网友

1楼 · 编辑于 2024-09-30 02:34:25

您可以对所有行应用过滤lambda函数，该函数接受每个字符并仅保留数字：

lists_combined['CustomerVoicePhone'] = (lists_combined.CustomerVoicePhone
                                                      .map(lambda x: ''.join(filter(str.isdigit, x))))

在性能方面，我们可以将其与以下代码中的其他答案进行比较，并发现对于大数据帧（100k电话号码），它的速度要快一些：

def gen_phone():
    first = str(random.randint(100,999))
    second = str(random.randint(1,888)).zfill(3)
    last = (str(random.randint(1,9998)).zfill(4))
    while last in ['1111','2222','3333','4444','5555','6666','7777','8888']:
        last = (str(random.randint(1,9998)).zfill(4))
    return '{}-{}-{}'.format(first,second, last)

df = pd.DataFrame(columns=['p'])
for _ in range(100000):
    p = gen_phone()
    df = df.append({'p':p}, ignore_index=True)

def method1():
    regex = '\)|\(|-|\+|\s' #or regex = '[\(\)\+\-\s]' using character class
    df['p_1'] = (df['p'].str.replace(regex,'')
                                 .fillna(df['p']))

%time method1()
# Wall time: 166 ms

def method2():
    df['p_2'] = (df.p.map(lambda x: ''.join(filter(str.isdigit, x))))

%time method2()
# Wall time: 151 ms

网友

2楼 · 编辑于 2024-09-30 02:34:25

首先，您应该避免替换系列，因为它会影响代码的可执行性。您可以在replace函数中使用一个列表来替换要替换为空字符串的元素

但是代码的主要部分是：应该替换df.str.replace（），而不仅仅是df.replace（）

干杯

网友

3楼 · 编辑于 2024-09-30 02:34:25

让我们使用带有repace和正则表达式的.str访问：

regex = '\)|\(|-|\+|\s' #or regex = '[\(\)\+\-\s]' using character class
lists_combined['CustomerVoicePhone'] = (lists_combined['CustomerVoicePhone'].str.replace(regex,'')
                                 .fillna(list_combine['CustomerVoicePhone']))

相关问题更多 >

编程相关推荐

热门问题

热门文章