在Python中,将列表中的关键字与一行单词相匹配

2024-09-27 07:32:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从下面的两行中提取一些具体的例子。在

[40.748330000000003, -73.878609999999995] 6 2011-08-28 19:52:47 Sometimes I wish my life was a movie; #unreal I hate the fact I feel lonely surrounded by so many ppl


[37.786221300000001, -122.1965002] 6 2011-08-28 19:55:26 I wish I could lay up with the love of my life And watch cartoons all day.

坐标和数字被忽略

这个案例是为了找出每个tweet行中有多少单词出现在这个关键字列表中:

^{pr2}$

此外,还要求出每个tweet行中关键字的值的总和(例如['love',10])。在

例如,对于句子

'I hate to feel lonely at times'

hate=1lonely=1的情感值之和等于2。 这一行的字数是7。在

我尝试过使用list-into-lists方法,甚至尝试遍历每个句子和关键字,但这些方法都不起作用,因为tweets和关键字的数量是多个的,我需要使用循环格式来查找值。在

提前感谢你的洞察力!!:)

我的代码:

try:
    KeywordFileName=input('Input keyword file name: ')
    KeywordFile = open(KeywordFileName, 'r')
except FileNotFoundError:
    print('The file you entered does not exist or is not in the directory')
    exit()
KeyLine = KeywordFile.readline()
while KeyLine != '':
    if list != []:
        KeyLine = KeywordFile.readline()
        KeyLine = KeyLine.rstrip()
        list = KeyLine.split(',')
        list[1] = int(list[1])
        print(list)
    else:
        break

try:
    TweetFileName = input('Input Tweet file name: ')
    TweetFile = open(TweetFileName, 'r')
except FileNotFoundError:
    print('The file you entered does not exist or is not in the directory')
    exit()

TweetLine = TweetFile.readline()
while TweetLine != '':
    TweetLine = TweetFile.readline()
    TweetLine = TweetLine.rstrip()

Tags: thereadlinemynot关键字listfileprint
1条回答
网友
1楼 · 发布于 2024-09-27 07:32:48

您可以使用简单的正则表达式来提取单词,并使用标记器来计算每个单词在示例字符串中的出现次数。在

from nltk.tokenize import word_tokenize
import collections
import re

str = '[40.748330000000003, -73.878609999999995] 6 2011-08-28 19:52:47 Sometimes I wish my life was a movie; #unreal I hate the fact I feel lonely surrounded by so many ppl'
num_regex = re.compile(r"[+-]?\d+(?:\.\d+)?")
str = num_regex.sub('',str)
words = word_tokenize(str)
final_list = collections.Counter(words)
print final_list

相关问题 更多 >

    热门问题