如何比较python中的两个HTML文件并只打印它们之间的差异?

2024-10-08 20:20:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个由sonar生成的html报告,显示了我代码中的问题。在

问题陈述:我需要比较两个声纳报告,找出差异,即引入的新问题。基本上需要找到html中的差异并只打印这些差异。在

我没试过什么-

import difflib
file1 = open('sonarlint-report.html', 'r').readlines()
file2 = open('sonarlint-report_latest.html', 'r').readlines()

 htmlDiffer = difflib.HtmlDiff()
 htmldiffs = htmlDiffer.make_file(file1, file2)

 with open('comparison.html', 'w') as outfile:
 outfile.write(htmldiffs)

现在这给了我一个比较.html它只不过是两个html diff。不会只打印不同的行。在

我是否应该尝试HTML解析,然后以某种方式将差异只打印出来?请提出建议。在


Tags: 代码reporthtml报告差异openfile1outfile
2条回答

如果使用difflib.Differ,则只能保留差异行,并使用每行上写入的两个字母代码进行过滤。从docs

class difflib.Differ

This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines.

Each line of a Differ delta begins with a two-letter code:

Code Meaning

'- ' line unique to sequence 1

'+ ' line unique to sequence 2

' ' line common to both sequences

'? ' line not present in either inputsequence

Lines beginning with ‘?’ attempt to guide the eye to intraline differences, and were not present in either input sequence. These lines can be confusing if the sequences contain tab characters

通过保持这些行以“-”和“+”开头,只是区别。在

我将首先尝试逐行遍历每个html文件,并检查这些行是否相同。在

with open('file1.html') as file1, open('file2.html') as file2:
    for file1Line, file2Line in zip(file1, file2):
        if file1Line != file2Line:
            print(file1Line.strip('\n'))
            print(file2Line.strip('\n'))

您将不得不在一行中处理换行符和多行差异,但这可能是一个好的开始:)

相关问题 更多 >

    热门问题