多词搜索无法正常工作（Python）问题的回答

多词搜索无法正常工作（Python）

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我在一个项目，要求我能够在一个文件中搜索多个关键字的工作。例如，如果我有一个文件中出现了100个单词“Tomato”，500个单词“Bread”，20个单词“Pickle”，我希望能够在文件中搜索“Tomato”和“Bread”，并获得它在文件中出现的次数。我在这个网站上找到了有同样问题的人，但是有其他语言的人。你知道吗 我有一个工作程序，允许我搜索列名并统计某个内容在该列中出现的次数，但我想让它更精确一些。这是我的密码： <pre><code>def start(): location = raw_input("What is the folder containing the data you like processed located? ") #location = "C:/Code/Samples/Dates/2015-06-07/Large-Scale Data Parsing/Data Files" if os.path.exists(location) == True: #Tests to see if user entered a valid path file_extension = raw_input("What is the file type (.txt for example)? ") search_for(location,file_extension) else: print "I'm sorry, but the file location you have entered does not exist. Please try again." start() def search_for(location,file_extension): querylist = [] n = 5 while n == 5: search_query = raw_input("What would you like to search for in each file? Use'Done' to indicate that you have finished your request. ") #list = ["CD90-N5722-15C", "CD90-NB810-4C", "CP90-N2475-8", "CD90-VN530-22B"] if search_query == "Done": print "Your queries are:",querylist print "" content = os.listdir(location) run(content,file_extension,location,querylist) n = 0 else: querylist.append(search_query) continue def run(content,file_extension,location,querylist): for item in content: if item.endswith(file_extension): search(location,item,querylist) quit() def search(location,item,querylist): with open(os.path.join(location,item), 'r') as f: countlist = [] for search in querylist: #any search value after the first one is incorrectly reporting "0" countsearch = 0 for line in f: if search in line: countsearch = countsearch + 1 countlist.append(search) countlist.append(countsearch) #mechanism to update countsearch is not working for any value after the first print item, countlist start() </code></pre> 如果使用该代码，则最后一部分（def search）工作不正常。每当我输入一个搜索时，在我输入的第一个搜索之后的任何搜索都返回“0”，尽管一个文件中出现的搜索词多达500000次。你知道吗 我还想知道，由于我必须索引5个文件，每个文件有1000000行，是否有一种方法可以编写一个额外的函数或什么来计算“莴苣”在所有文件中出现的次数。你知道吗 由于文件的大小和内容，我不能在这里发布这些文件。任何帮助都将不胜感激。你知道吗 编辑 我这里也有这段代码。如果我使用这个，我会得到每个搜索的正确计数，但最好让用户能够输入他们想要的任意多个搜索： <pre><code>def check_start(): #location = raw_input("What is the folder containing the data you like processed located? ") location = "C:/Code/Samples/Dates/2015-06-07/Large-Scale Data Parsing/Data Files" content = os.listdir(location) for item in content: if item.endswith("processed"): countcol1 = 0 countcol2 = 0 countcol3 = 0 countcol4 = 0 #print os.path.join(currentdir,item) with open(os.path.join(location,item), 'r') as f: for line in f: if "CD90-N5722-15C" in line: countcol1 = countcol1 + 1 if "CD90-NB810-4C" in line: countcol2 = countcol2 + 1 if "CP90-N2475-8" in line: countcol3 = countcol3 + 1 if "CD90-VN530-22B" in line: countcol4 = countcol4 + 1 print item, "CD90-N5722-15C", countcol1, "CD90-NB810-4C", countcol2, "CP90-N2475-8", countcol3, "CD90-VN530-22B", countcol4 </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

多词搜索无法正常工作（Python）

1 个回答

相关Python问题