文本Fi的Python字数

2024-10-01 09:27:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用python函数来计算文本文件中单词的频率。我可以分别得到所有单词的频率,但是我试图通过在列表中列出特定单词的数量。这是我到目前为止所拥有的,但我现在被卡住了。我的

def repeatedWords():
    with open(fname) as f:
        wordcount={}
        for word in word_list:
            for word in f.read().split():
                if word not in wordcount:
                    wordcount[word] = 1
                else:
                    wordcount[word] += 1
            for k,v in wordcount.items():
                 print k, v

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
repeatedWords('file.txt')

更新,仍显示所有单词:

^{pr2}$

word_list=['Emma'、'Woodhouse'、'father'、'Taylor'、'Miss'、'been'、'she'、'her'] 重复单词('艾玛.txt',单词表)


Tags: infor单词wordcountlistword频率been
2条回答

最好的处理方法是在Python字典中使用get方法。可以是这样的:

def repeatedWords():
with open(fname) as f:
    wordcount = {}
    #Example list of words not needed
    nonwordlist = ['father', 'Miss', 'been']
    for word in word_list:
        for word in file.read().split():
            if not word in nonwordlist:
                wordcount[word] = wordcount.get(word, 0) + 1


# Put these outside the function repeatedWords
for k,v in wordcount.items():
    print k, v

打印声明应提供以下信息:

^{pr2}$

这行wordcount[word] = wordcount.get(word, 0) + 1所做的是,它首先在字典wordcount中查找{},如果这个单词已经存在,它首先得到它的值并将1加到它上面。如果word不存在,则该值默认为0,并且在这个实例中,1被添加,使其成为该单词的第一次出现,计数为1。在

所以你只需要列表中特定单词的出现频率(艾玛,伍德豪斯,父亲?如果是这样,此代码可能会有所帮助(请尝试运行):

    word_list = ['Emma','Woodhouse','father','Taylor','Miss','been','she','her']
    #i'm using this example text in place of the file you are using
    text = 'This is an example text. It will contain words you are looking for, like Emma, Emma, Emma, Woodhouse, Woodhouse, Father, Father, Taylor,Miss,been,she,her,her,her. I made them repeat to show that the code works.'
    text = text.replace(',',' ') #these statements remove irrelevant punctuation
    text = text.replace('.','')
    text = text.lower() #this makes all the words lowercase, so that capitalization wont affect the frequency measurement

    for repeatedword in word_list:
        counter = 0 #counter starts at 0
        for word in text.split():
            if repeatedword.lower() == word:
                counter = counter + 1 #add 1 every time there is a match in the list
        print(repeatedword,':', counter) #prints the word from 'word_list' and its frequency

输出只显示您提供的列表中这些单词的频率,这就是您想要的对吗?在

在python3中运行时产生的输出是:

^{pr2}$

相关问题 更多 >