回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我在一个项目,要求我能够在一个文件中搜索多个关键字的工作。例如,如果我有一个文件中出现了100个单词“Tomato”,500个单词“Bread”,20个单词“Pickle”,我希望能够在文件中搜索“Tomato”和“Bread”,并获得它在文件中出现的次数。我在这个网站上找到了有同样问题的人,但是有其他语言的人。你知道吗</p>
<p>我有一个工作程序,允许我搜索列名并统计某个内容在该列中出现的次数,但我想让它更精确一些。这是我的密码:</p>
<pre><code>def start():
location = raw_input("What is the folder containing the data you like processed located? ")
#location = "C:/Code/Samples/Dates/2015-06-07/Large-Scale Data Parsing/Data Files"
if os.path.exists(location) == True: #Tests to see if user entered a valid path
file_extension = raw_input("What is the file type (.txt for example)? ")
search_for(location,file_extension)
else:
print "I'm sorry, but the file location you have entered does not exist. Please try again."
start()
def search_for(location,file_extension):
querylist = []
n = 5
while n == 5:
search_query = raw_input("What would you like to search for in each file? Use'Done' to indicate that you have finished your request. ")
#list = ["CD90-N5722-15C", "CD90-NB810-4C", "CP90-N2475-8", "CD90-VN530-22B"]
if search_query == "Done":
print "Your queries are:",querylist
print ""
content = os.listdir(location)
run(content,file_extension,location,querylist)
n = 0
else:
querylist.append(search_query)
continue
def run(content,file_extension,location,querylist):
for item in content:
if item.endswith(file_extension):
search(location,item,querylist)
quit()
def search(location,item,querylist):
with open(os.path.join(location,item), 'r') as f:
countlist = []
for search in querylist: #any search value after the first one is incorrectly reporting "0"
countsearch = 0
for line in f:
if search in line:
countsearch = countsearch + 1
countlist.append(search)
countlist.append(countsearch) #mechanism to update countsearch is not working for any value after the first
print item, countlist
start()
</code></pre>
<p>如果使用该代码,则最后一部分(def search)工作不正常。每当我输入一个搜索时,在我输入的第一个搜索之后的任何搜索都返回“0”,尽管一个文件中出现的搜索词多达500000次。你知道吗</p>
<p>我还想知道,由于我必须索引5个文件,每个文件有1000000行,是否有一种方法可以编写一个额外的函数或什么来计算“莴苣”在所有文件中出现的次数。你知道吗</p>
<p>由于文件的大小和内容,我不能在这里发布这些文件。任何帮助都将不胜感激。你知道吗</p>
<p><strong>编辑</p>
<p>我这里也有这段代码。如果我使用这个,我会得到每个搜索的正确计数,但最好让用户能够输入他们想要的任意多个搜索:</p>
<pre><code>def check_start():
#location = raw_input("What is the folder containing the data you like processed located? ")
location = "C:/Code/Samples/Dates/2015-06-07/Large-Scale Data Parsing/Data Files"
content = os.listdir(location)
for item in content:
if item.endswith("processed"):
countcol1 = 0
countcol2 = 0
countcol3 = 0
countcol4 = 0
#print os.path.join(currentdir,item)
with open(os.path.join(location,item), 'r') as f:
for line in f:
if "CD90-N5722-15C" in line:
countcol1 = countcol1 + 1
if "CD90-NB810-4C" in line:
countcol2 = countcol2 + 1
if "CP90-N2475-8" in line:
countcol3 = countcol3 + 1
if "CD90-VN530-22B" in line:
countcol4 = countcol4 + 1
print item, "CD90-N5722-15C", countcol1, "CD90-NB810-4C", countcol2, "CP90-N2475-8", countcol3, "CD90-VN530-22B", countcol4
</code></pre>