计算列中的字数，将前X存储在新列中

import collections df["topXwords1"]= collections.Counter(df["topXwords"]) TypeError: unhashable type: 'list' This fails, but it works in this example: xxx = ["a","a","b"] counter = collections.Counter(xxx) counter Out[43]: Counter({'a': 2, 'b': 1})

2条回答

网友

1楼 · 编辑于 2024-09-26 17:53:40

具有pd.Series.value_counts功能：

In [333]: df["topXwords"] = df.Text.apply(lambda s: pd.Series(s.split()).value_counts().to_dict())

In [334]: df
Out[334]: 
   Doc_ID                Text                  topXwords
0       1            hi hi hi                  {'hi': 3}
1       2  hello hello1 hello  {'hello': 2, 'hello1': 1}
2       3           hey hallo     {'hallo': 1, 'hey': 1}

网友

2楼 · 编辑于 2024-09-26 17:53:40

使用apply：

from collections import Counter
import pandas as pd

data = [[1, 'hi hi hi'],
        [2, 'hello hello1 hello'],
        [3, 'hey hallo']]

df = pd.DataFrame(data=data, columns=['Doc_ID', 'Text'])

print(df.Text.str.split().apply(Counter))

输出

0                    {'hi': 3}
1    {'hello': 2, 'hello1': 1}
2       {'hey': 1, 'hallo': 1}
Name: Text, dtype: object

如果只想包含前x个单词，请执行以下操作（在本例中x=1）：

df['topXwords'] = df.Text.str.split().apply(lambda x: Counter(x).most_common(1))
print(df)

输出

   Doc_ID                Text     topXwords
0       1            hi hi hi     [(hi, 3)]
1       2  hello hello1 hello  [(hello, 2)]
2       3           hey hallo    [(hey, 1)]

相关问题更多 >

编程相关推荐

热门问题

热门文章

计算列中的字数，将前X存储在新列中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >