按频率排序文件中的哈希标记并将其发送到另一个fi

import collections with open("/Users/Adnan/Desktop/twitter_data.txt") as data: for line in data: for part in line.split(): if "#" in part: print(part) print(collections.Counter(part).most_common())

3条回答

网友

1楼 · 编辑于 2024-09-27 17:30:19

您可以使用collections模块并使用collections.Counter(list_of_hastags).most_common(# most common you want)返回文件中最常见的事件。在

或者，如果你不想限制，你甚至不需要传递最常见事件的数量。在

小例子：

import collections
#In your file this will likely be data.readlines() depending on how your file is struct.
#to get the list of hastags, you may need to split etc depending on structure
hashtags = ['#1', '#1', '#2', '#2', '#3', '#4', '#4']
print(collections.Counter(hashtags).most_common())

结果：

^{pr2}$

网友

2楼 · 编辑于 2024-09-27 17:30:19

您可以使用^{}数据类型计算每个标签的频率，如下所示：

from collections import Counter

freq = Counter()
with open("twitter_data.txt") as data:
    for line in data:
        for part in line.split():
            if "#" in part:
                freq[part] += 1
print(freq.most_common())

根据问题和现有代码的结构，twitter_data.txt看起来像这样（每条tweet用newline分隔）：

^{pr2}$

在此示例文件上运行上述代码将生成以下输出：

^{3}$

网友

3楼 · 编辑于 2024-09-27 17:30:19

{a1在循环中定义了一个偶数参数，你可以在一个嵌套的参数中插入一个参数：

data="""
#1 hello #2
this is #2 a #3 test
#2 life is good #1""".split("\n")

import collections

hashtags = collections.Counter(part
                                for line in data
                                    for part in line.split()
                                        if "#" in part)

print(hashtags.most_common())

给我这个输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章