擅长:python、mysql、java
<p>你必须在计算之前做单词预处理。例如:</p>
<pre><code>import pandas as pd
import re
REPLACE_BY_SPACE_RE = re.compile('[/(){}\[\]\|@,;]')
BAD_SYMBOLS_RE = re.compile('[^0-9a-z #+_]')
def clean_text(text):
text = text.lower()
text = REPLACE_BY_SPACE_RE.sub(' ', text)
text = BAD_SYMBOLS_RE.sub('', text)
return text
ft = pd.read_json(path_of_file) # read ur file in pandas df
ft = ft.apply(clean_text)
</code></pre>
<p>祝你好运</p>