在Python中将输出从二进制结果更改为频率

2024-10-01 13:44:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经组合了许多文件(第1批)的标记,创建了一个主单词频率列表,现在我正在与一系列其他文件(第2批)进行比较。最初,我创建了一个二进制输出,如果主单词列表和批处理2中的给定文件中都有一个单词,它将输出“1”,如果没有,则输出“0”。例如[1,0,1,1]

现在,我希望它输出出现的单词的频率,即如果“cat”在主单词频率列表中出现9次,并且在文件1第2批中,它将输出“9”而不是“1”。例如[9,0,21,42]

# globalFreqSets generates a dictionary like output: ('to', 634), ('be', 604), ('and', 594)

# finalValues generates just the number element of globalFreqSets: [634, 604, 594]

output = []   
for text in doc_text:  
binarySim = []   
# creates loop to indirectly navigate through "globalFreqSets".    
# only the first item needs to be retrieved ('patient') hence the second item is set to [0] .   
for j in range(len(globalFreqSets)):  

    master_wordlist = globalFreqSets[j][0]

    i = 0
    # looping through words in list "text"
    for sub_wordlist in text:
        i += 1
        # adds 1 to "binarySim" when target word in master_wordlist is present in the sub_wordlist
        if master_wordlist == sub_wordlist:
            binarySim.append(1)
            # breaks when a match is found to avoid multiple entries per word
            break
        # adds 0 to "binarySim" when target word in master_wordlist is not present in the sub_wordlist
        elif i == len(text):
            binarySim.append(0)
# adding "binarySim" to "output"
output.extend([binarySim])

抱歉,如果这是错误的格式或措辞,我还是相当新的编码:)


Tags: 文件thetotextinmaster列表for