搜索日志,在匹配前后输出行

2024-09-28 05:17:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试制作一个脚本,允许我搜索文件夹中的所有文本文件以查找字符串,并在我要查找的字符串所在的行之前和之后编写一个选定的行数。你知道吗

我的问题是,当我在slice方法中放入一个变量时,我只得到匹配前的行数。当我用纯数字([1:6])进行测试时,它是有效的。你知道吗

我错过了什么?你知道吗

任何改进建议也非常感谢。你知道吗


我要查找的内容(数据文件):

12345

要写入结果文件的内容:

2 elit. Aenean commodo ligula eget d
3 olor. Aenean massa. 
4 Cum sociis natoque 12345penatibus et m
5 agnis dis parturient montes, nasc
6 etur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat

我正在搜索的文本:

1 Lorem ipsum dolor sit amet, consectetuer adipiscing
2 elit. Aenean commodo ligula eget d
3 olor. Aenean massa. 
4 Cum sociis natoque 12345 penatibus et m
5 agnis dis parturient montes, nasc
6 etur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat
7 massa quis enim. Donec pede just
8 o, fringilla vel, 
9 aliquet nec, vulputate eget, arcu. In enim justo,

代码

import os 

search_folder = r'E:\stash\Logs'
datafile = r'E:\stash\variable.txt' 
resultsFile = r'E:\stash\results.txt'
nbrOfLinesOver = 3
nbrOfLinesUnder = 2


# Finds all the log files in the directory that needs to be searched through

def findFiles(folder):

    log_files = []
    for files in os.listdir(search_folder):
        log_files.append(files)
    return log_files

# Finds the strings I want to search for

def searchFor(datafile):

    stringToFind = open(datafile,'r')
    data = stringToFind.readline()
    data = str(data).split()
    map(str.strip,data)
    stringToFind.close()
    return data

#Searches through the text files to find the strings and outputs the number of lines defined under and over the match

def findLogData(log_Files, searchForData, folderPath, resultsFile):
    resultFile = open(resultsFile, "w")
    lineCounter = 0
    logLines = [] 

    for file in log_Files:
        datalookUp = open(folderPath + "\\" + file,'r', encoding='UTF-8')
        log = datalookUp.readlines()

        for line in log:
            lineCounter += 1 
            logLines.append(str(line))            

            for stringToFind in searchForData:
                if stringToFind in line: 
                    slinceStart = lineCounter - nbrOfLinesOver
                    slinceEnd = lineCounter + nbrOfLinesUnder
                    resultFile.writelines(logLines[slinceStart:slinceEnd])

    resultFile.close()
    datalookUp.close()


FilesToSearch = findFiles(search_folder)
stringsToFind = searchFor(datafile)
findLogData(FilesToSearch,stringsToFind,search_folder,resultsFile)

编辑:
我也有搜索问题。现在,我必须把所有我想搜索的东西放在一行。当文本文件中列出了所有要搜索的字符串时,列表中也会出现“\n”。 这也是map函数的原因。这是我忘记删除的代码,当我试图摆脱它后,我在一个论坛上发现了一个建议,但我不能让脱衣舞删除换行符。你知道吗


Tags: the字符串inlogforsearchdatafiles
3条回答

是的,发生这种情况是因为logLines只包含包含字符串的行,而不包含其后的任何行(因为这些行尚未被读取)。你知道吗

另外,需要注意的一点是,当切片时,即使切片边界超出边界,它也不会抛出错误,相反,它将获取范围内所有可能的元素并返回该错误。示例-

>>> lst = [1,2,3,4]
>>> lst[3:123]
[4]

您不应该将整个日志文件存储在logLines中的内存中,而应该只存储所需的量。另外,建议使用with会更好,因为它可以帮您关闭文件。示例代码-

def findLogData(log_Files, searchForData, folderPath, resultsFile):
    with open(resultsFile, "w") as resultFile:
        lineCounter = 0

        for file in log_Files:
            with open(folderPath + "\\" + file,'r', encoding='UTF-8') as datalookUp:
                logLines = []
                flag = False
                remLines = 0
                for line in log:
                    if remLines > 0:
                        resultsFile.write(line)
                        remLines -= 1
                    logLines.append(line)
                    if len(logLines) > nbrOfLinesOver + 1:
                        logLines.pop(0)

                    for stringToFind in searchForData:
                        if stringToFind in line:
                            resultsFile.writelines(logLines)
                            remLines = nbrOfLinesUnder

logLines似乎是多余的—只需使用log,您不需要创建单独的列表来保存感兴趣的行。你知道吗

我修改了你的函数,保留了用##和单个#注释的原始行,以简化我的文件路径构造。你知道吗

def findLogData(log_Files, searchForData, folderPath, resultsFile, span = (3,2)):
##    resultFile = open(resultsFile, "w")
##    lineCounter = 0
##    logLines = [] 

##        datalookUp = open(folderPath + "\\" + file,'r', encoding='UTF-8')
##        log = datalookUp.readlines()
    nbrOfLinesOver, nbrOfLinesUnder = span
    with open(resultsFile, 'w') as resultFile:
        for filename in log_Files:
            #filename = folderPath + "\\" + file
            with open(filename,'r', encoding='UTF-8') as f:
                log = f.readlines()

            for lineCounter, line in enumerate(log):
    ##            lineCounter += 1 
    ##            logLines.append(str(line))            

                for stringToFind in searchForData:
                    if stringToFind in line: 
                        slinceStart = min(0, lineCounter - nbrOfLinesUnder)
                        slinceEnd = max(len(log), lineCounter + nbrOfLinesOver + 1)
##                        resultFile.writelines(logLines[slinceStart:slinceEnd])
                        resultFile.writelines(log[slinceStart:slinceEnd])

##    resultFile.close()
##    datalookUp.close()

删除了原来的几行:

def findLogData(log_Files, searchForData, folderPath, resultsFile, span = (3,2)):
    nbrOfLinesOver, nbrOfLinesUnder = span
    with open(resultsFile, 'w') as resultFile:
        for filename in log_Files:

            #filename = folderPath + "\\" + file
            with open(filename,'r', encoding='UTF-8') as f:
                log = f.readlines()

            for lineCounter, line in enumerate(log):
                for stringToFind in searchForData:
                    if stringToFind in line: 
                        slinceStart = min(0, lineCounter - nbrOfLinesUnder)
                        slinceEnd = max(len(log), lineCounter + nbrOfLinesOver + 1)
                        resultFile.writelines(log[slinceStart:slinceEnd])

如果搜索文件夹中的所有文件都是日志文件,那么实际上不需要findFiles()-os.listdir()返回一个列表

FilesToSearch = os.listdir(search_folder)

我可能会让FilesToSearch包含每个日志文件的整个文件路径。你知道吗

FilesToSearch  = []
for fname in os.listdir(search_folder):
    FilesToSearch.append(search_folder + '\\' + fname)

或者

import os.path
for fname in os.listdir(search_folder):
    FilesToSearch.append(os.path.join(search_folder,fname))

看起来您希望使用datafile的第一行。我更喜欢使用上下文管理器(^{} statement)打开文件,这样文件将始终关闭:

def searchFor(datafile):
    with open(datafile,'r') as f:
        data = f.readline()
        data = data.split()
        data = [thing.strip() for thing in data]
        return data

下面是我对这项任务的看法(简化了一点,以获得重要的一点—在从文件读入匹配之前/之后获取x行上下文):

测试日志

1 Lorem ipsum dolor sit amet, consectetuer adipiscing
2 elit. Aenean commodo ligula eget d
3 olor. Aenean massa.
4 Cum sociis natoque 12345 penatibus et m
5 agnis dis parturient montes, nasc
6 etur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat
7 massa quis enim. Donec pede just
8 o, fringilla vel,
9 aliquet nec, vulputate eget, arcu. In enim justo,

测试.py

def get_lines(file_name):
    lines = None
    with open(file_name, 'rb') as f:
        lines = f.readlines()
    return lines

def print_matches_in_file(search_str, file_name, num_lines_context=0):
    """
    Will print out matching lines as well as num_lines_context before and
    num_lines_context after
    """
    lines = get_lines(file_name)
    if lines:
        num_lines = len(lines)
        for idx, line in enumerate(lines):
            if search_str in line:
                start_line = max([0, idx - num_lines_context])
                end_line = min([num_lines, idx + num_lines_context + 1])
                print ''.join(lines[start_line: end_line])

print_matches_in_file("12345", "test.log", num_lines_context=2)

运行时,输出:

2 elit. Aenean commodo ligula eget d
3 olor. Aenean massa.
4 Cum sociis natoque 12345 penatibus et m
5 agnis dis parturient montes, nasc
6 etur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat

这将在搜索匹配的子字符串之前立即将整个日志文件读入内存,根据文件的大小,匹配的子字符串可能合适,也可能不合适。你知道吗

相关问题 更多 >

    热门问题