用Python搜索和输出

1条回答

网友

1楼 · 发布于 2024-10-03 23:20:58

我想我还没有完全理解你的问题。发布你的代码和一个示例文件会很有帮助。你知道吗

此代码将统计所有文件中的所有条目，然后它将标识每个文件中的唯一条目。之后，它将统计每个条目在每个文件中的出现次数。然后，它将只选择出现在所有文件中至少90%的条目。你知道吗

而且，这段代码本来可以短一些，但为了可读性，我创建了许多变量，它们的名称很长，很有意义。你知道吗

请阅读评论；）

import os
from collections import Counter
from sys import argv

# adjust your cut point
PERCENT_CUT = 0.9

# here we are going to save each file's entries, so we can sum them later
files_dict = {}

# total files seems to be the number you'll need to check against count
total_files  = 0;

# raw total entries, even duplicates
total_entries = 0;

unique_entries = 0;

# first argument is script name, so have the second one be the folder to search
search_dir = argv[1]

# list everything under search dir - ideally only your input files
# CHECK HOW TO READ ONLY SPECIFIC FILE types if you have something inside the same folder
files_list = os.listdir(search_dir)

total_files = len(files_list)

print('Files READ:')

# iterate over each file found at given folder
for file_name in files_list:
    print("    "+file_name)

    file_object = open(search_dir+file_name, 'r')

    # returns a list of entries with 'newline' stripped
    file_entries = map(lambda it: it.strip("\r\n"), file_object.readlines())

    # gotta count'em all
    total_entries += len(file_entries)

    # set doesn't allow duplicate entries
    entries_set = set(file_entries)

    #creates a dict from the set, set each key's value to 1.
    file_entries_dict = dict.fromkeys(entries_set, 1)

    # entries dict is now used differenty, each key will hold a COUNTER
    files_dict[file_name] = Counter(file_entries_dict)

    file_object.close();


print("\n\nALL ENTRIES COUNT: "+str(total_entries))

# now we create a dict that will hold each unique key's count so we can sum all dicts read from files
entries_dict = Counter({})

for file_dict_key, file_dict_value in files_dict.items():
    print(str(file_dict_key)+" - "+str(file_dict_value))
    entries_dict += file_dict_value

print("\nUNIQUE ENTRIES COUNT: "+str(len(entries_dict.keys())))

# print(entries_dict)

# 90% from your question
cut_line = total_files * PERCENT_CUT
print("\nNeeds at least "+str(int(cut_line))+" entries to be listed below")
#output dict is the final dict, where we put entries that were present in > 90%  of the files.
output_dict = {}
# this is PYTHON 3 - CHECK YOUR VERSION as older versions might use iteritems() instead of items() in the line belows
for entry, count in entries_dict.items():
    if count > cut_line:
        output_dict[entry] = count;

print(output_dict)

相关问题更多 >

编程相关推荐

热门问题

热门文章

用Python搜索和输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >