<p>这里的新程序员,非常感谢这个知识渊博的社区愿意提供的任何帮助。你知道吗</p>
<p>我在一个pandas数据框中有一列140000个文本字符串(公司名称),在这个数据框中,我想去掉字符串中/周围的所有空格,删除所有标点符号,替换特定的子字符串,并统一转换为小写。然后我想获取字符串中的前0:10元素,并将它们存储在新的dataframe列中。你知道吗</p>
<p>这是一个可复制的例子。你知道吗</p>
<pre><code>import string
import pandas as pd
data = ["West Georgia Co",
"W.B. Carell Clockmakers",
"Spine & Orthopedic LLC",
"LRHS Saint Jose's Grocery",
"Optitech@NYCityScape"]
df = pd.DataFrame(data, columns = ['co_name'])
def remove_punctuations(text):
for punctuation in string.punctuation:
text = text.replace(punctuation, '')
return text
# applying remove_punctuations function
df['co_name_transform'] = df['co_name'].apply(remove_punctuations)
# this next step replaces 'Saint' with 'st' to standardize,
# and I may want to make other substitutions but this is a common one.
df['co_name_transform'] = df.co_name_transform.str.replace('Saint', 'st')
# replace whitespace
df['co_name_transform'] = df.co_name_transform.str.replace(' ', '')
# make lowercase
df['co_name_transform'] = df.co_name_transform.str.lower()
# select first 0:10 of strings
df['co_name_transform'] = df.co_name_transform.str[0:10]
print(df)
</code></pre>
<pre><code> co_name co_name_transform
0 West Georgia Co westgeorgi
1 W.B. Carell Clockmakers wbcarellcl
2 Spine & Orthopedic LLC spineortho
3 LRHS Saint Jose's Grocery lrhsstjose
4 Optitech@NYCityScape optitechny
</code></pre>
<p>我怎样才能把所有这些步骤都放到这样一个函数中呢?你知道吗</p>
<pre><code>def clean_text(df[col]):
for co in co_name:
do_all_the_steps
return df[new_col]
</code></pre>
<p>谢谢</p>