标签计数器Python

def analyze(posts): hashtag_dict = {} for post_string in posts: for char in post_string: if char == "#": hash_index = post_string.find(char) counter = 1 tag = "" for tag_char in post_string[hash_index + 1:]: if tag_char.isdigit() or tag_char.isalpha(): tag += tag_char elif tag in hashtag_dict: counter += 1 hashtag_dict[tag] = counter break else: hashtag_dict[tag] = counter break return hashtag_dict posts = [ "hi #weekend", "good morning #zurich #limmat", "spend my #weekend in #zurich", "#zurich <3"] print(analyze(posts))

3条回答

网友

1楼 · 编辑于 2024-05-12 10:59:09

基本上，您的函数不起作用，因为这行

hash_index = post_string.find(char)

将始终在字符串中找到第一个哈希标记的索引。这可以通过提供start index to ^{}来解决，或者更好的方法是，完全不调用str.find，而是在遍历字符串时维护索引（可以使用enumerate）。更好的是，不要使用索引，如果您将解析器重组为使用状态机，则不需要索引。你知道吗

也就是说，Pythonic实现将用regular expression替换整个函数，这将使它变得更短、更正确、更可读，而且可能更高效。你知道吗

网友

2楼 · 编辑于 2024-05-12 10:59:09

这应该起作用：

import string
alpha = string.ascii_letters + string.digits

def analyze(posts):
    hashtag_dict = {}

    for post in posts:
        for i in post.split():
            if i[0] == '#':
                current_hashtag = sanitize(i[1:])

                if len(current_hashtag) > 0:
                    if current_hashtag in hashtag_dict:
                        hashtag_dict[current_hashtag] += 1
                    else:
                        hashtag_dict[current_hashtag] = 1

    return hashtag_dict


def sanitize(s):
    s2 = ''
    for i in s:
        if i in alpha:
            s2 += i
        else:
            break
    return s2


posts = [
        "hi #weekend",
        "good morning #zurich #limmat",
        "spend my #weekend in #zurich",
        "#zurich <3",
        "#lindehof4Ever(lol)"
        ]

print(analyze(posts))

网友

3楼 · 编辑于 2024-05-12 10:59:09

嗯

这个任务可以用regex完成，不要害怕使用它们；）一些快速的解决方案。你知道吗

#!/usr/bin/python3.4
import re

posts = [
    "hi #weekend",
    "good morning #zurich #limmat",
    "spend my #weekend in #zurich",
    "#zurich <3"]

container = {}
for post in posts:
    elements = re.findall('#(\w+)', post)
    for element in elements:
        if container.get(element, None):
            container[element] += 1
        else:
            container[element] = 1
print(container)

结果：

{'zurich': 3, 'limmat': 1, 'weekend': 2}

相关问题更多 >

编程相关推荐

热门问题

热门文章