如何有效地清除Pandas身上的数据？

id descriptions 0 kartu debit 20 10 indomaretcipete r 1 tarikan atm 20 10 2 tarikan atm 19 10 3 biaya adm 4 trsf 18 10 wsid 23881 indah lestari

def cleaning(text): stops = {'10', '18','19', '20', '23881'} text = [word for word in text if not word in stops] text = " ".join(text) return(text) df['description_clean'] = df['description'].apply(cleaning)

id descriptions 0 kartu debit indomaretcipete r 1 tarikan atm 2 tarikan atm 3 biaya adm 4 trsf wsid indah lestari

3条回答

网友

1楼 · 编辑于 2024-10-05 13:21:44

使用^{}和^{}：

df['descriptions'] = (df['descriptions'].str.extractall('([a-zA_Z]+)')
                                        .groupby(level=0).agg({0:' '.join}))

或：

df['descriptions'] = (df['descriptions'].str.replace('\d+','')
                                        .str.replace('  ',''))

或：

df['descriptions'] = [' '.join(re.findall('[a-zA-Z]+',s)) for s in df['descriptions']]

print(df)
   id                   descriptions
0   0  kartu debit indomaretcipete r
1   1                    tarikan atm
2   2                    tarikan atm
3   3                      biaya adm
4   4        trsf wsid indah lestari

网友

2楼 · 编辑于 2024-10-05 13:21:44

您需要：

def replace_numbers(s):
    return re.sub(r'\d*', '', s)


df['description'] = df['description'].apply(replace_numbers)

网友

3楼 · 编辑于 2024-10-05 13:21:44

IIUC，您需要从数据帧中删除数字，请使用以下命令：

df_new=df.replace('\d+ ','',regex=True)
print(df_new)

   id                   descriptions
0   0  kartu debit indomaretcipete r
1   1                 tarikan atm 10
2   2                 tarikan atm 10
3   3                      biaya adm
4   4        trsf wsid indah lestari

对于一个系列：df['descriptions']=df['descriptions'].replace('\d+ ','',regex=True)

注意：我在regex中的d+之后添加了一个空格，这取决于您的示例，如果您愿意，可以不使用它。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章