如何将文本清理步骤压缩为单个Python函数？问题的回答

如何将文本清理步骤压缩为单个Python函数？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

这里的新程序员，非常感谢这个知识渊博的社区愿意提供的任何帮助。你知道吗 我在一个pandas数据框中有一列140000个文本字符串（公司名称），在这个数据框中，我想去掉字符串中/周围的所有空格，删除所有标点符号，替换特定的子字符串，并统一转换为小写。然后我想获取字符串中的前0:10元素，并将它们存储在新的dataframe列中。你知道吗 这是一个可复制的例子。你知道吗 <pre><code>import string import pandas as pd data = ["West Georgia Co", "W.B. Carell Clockmakers", "Spine & Orthopedic LLC", "LRHS Saint Jose's Grocery", "Optitech@NYCityScape"] df = pd.DataFrame(data, columns = ['co_name']) def remove_punctuations(text): for punctuation in string.punctuation: text = text.replace(punctuation, '') return text # applying remove_punctuations function df['co_name_transform'] = df['co_name'].apply(remove_punctuations) # this next step replaces 'Saint' with 'st' to standardize, # and I may want to make other substitutions but this is a common one. df['co_name_transform'] = df.co_name_transform.str.replace('Saint', 'st') # replace whitespace df['co_name_transform'] = df.co_name_transform.str.replace(' ', '') # make lowercase df['co_name_transform'] = df.co_name_transform.str.lower() # select first 0:10 of strings df['co_name_transform'] = df.co_name_transform.str[0:10] print(df) </code></pre> <pre><code> co_name co_name_transform 0 West Georgia Co westgeorgi 1 W.B. Carell Clockmakers wbcarellcl 2 Spine & Orthopedic LLC spineortho 3 LRHS Saint Jose's Grocery lrhsstjose 4 Optitech@NYCityScape optitechny </code></pre> 我怎样才能把所有这些步骤都放到这样一个函数中呢？你知道吗 <pre><code>def clean_text(df[col]): for co in co_name: do_all_the_steps return df[new_col] </code></pre> 谢谢

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何将文本清理步骤压缩为单个Python函数？

1 个回答

相关Python问题