<p>首先要删除缺少的值,必须使用<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html" rel="nofollow noreferrer">^{<cd1>}</a>和指定列名,然后使用<code>tokenizer.tokenize</code>方法,因为您的解决方案不会删除缺少的值:</p>
<pre><code>df = pd.DataFrame({'all_cols':['who is your hero and why',
'what do you do to relax',
"can't stop to eat", np.nan]})
print (df)
all_cols
0 who is your hero and why
1 what do you do to relax
2 can't stop to eat
3 NaN
</code></pre>
<hr/>
<pre><code>#solution remove missing values from Series, not rows from df
df['all_cols'].dropna(inplace=True)
print (df)
all_cols
0 who is your hero and why
1 what do you do to relax
2 can't stop to eat
3 NaN
</code></pre>
<hr/>
<pre><code>#solution correct remove rows by missing values
df.dropna(subset=['all_cols'], inplace=True)
print (df)
all_cols
0 who is your hero and why
1 what do you do to relax
2 can't stop to eat
</code></pre>
<hr/>
<pre><code>tokenizer = RegexpTokenizer("[\w']+")
df['all_cols'] = df['all_cols'].apply(tokenizer.tokenize)
print (df)
all_cols
0 [who, is, your, hero, and, why]
1 [what, do, you, do, to, relax]
2 [can't, stop, to, eat]
</code></pre>