我有一个数据帧:
Filtered_data
['defence possessed russia china','factors driving china modernise']
['force bolster pentagon','strike capabilities pentagon congress detailing china']
[missiles warheads', 'deterrent face continued advances']
......
......
我只想将每个列表元素拆分为子元素(标记化的单词)
Filtered_data
[defence, possessed,russia,factors,driving,china,modernise]
[force,bolster,strike,capabilities,pentagon,congress,detailing,china]
[missiles,warheads, deterrent,face,continued,advances]
这是我的密码我试过了
for text in df['Filtered_data'].iteritems():
for i in text.split():
print (i)
您可以使用} 。与
itertools.chain
+^{set
相比,toolz.unique
的好处是它保持了有序性。你知道吗将列表理解与
split
和flatening结合使用:编辑:
对于唯一值,标准方法是使用
set
s:但如果值的排序很重要,请使用^{} :
相关问题 更多 >
编程相关推荐