如何以颜色显示两个字符串序列的差异?

2024-10-02 16:30:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图找到一种Python方法来区分字符串。我知道difflib,但我还没有找到一种内联模式,它可以执行类似于this JS library的操作(绿色的插入,红色的删除):

one_string =   "beep boop"
other_string = "beep boob blah"

colored diff

有没有办法做到这一点


Tags: 方法字符串stringlibraryjs模式thisbeep
3条回答

使用@interjay的评论,我得到了

import difflib

red = lambda text: f"\033[38;2;255;0;0m{text}\033[38;2;255;255;255m"
green = lambda text: f"\033[38;2;0;255;0m{text}\033[38;2;255;255;255m"
blue = lambda text: f"\033[38;2;0;0;255m{text}\033[38;2;255;255;255m"
white = lambda text: f"\033[38;2;255;255;255m{text}\033[38;2;255;255;255m"

def get_edits_string(old, new):
    result = ""
    codes = difflib.SequenceMatcher(a=old, b=new).get_opcodes()
    for code in codes:
        if code[0] == "equal": 
            result += white(old[code[1]:code[2]])
        elif code[0] == "delete":
            result += red(old[code[1]:code[2]])
        elif code[0] == "insert":
            result += green(new[code[3]:code[4]])
        elif code[0] == "replace":
            result += (red(old[code[1]:code[2]]) + green(new[code[3]:code[4]]))
    return result

它只依赖于difflib,可以用

one_string =   "beep boop"
other_string = "beep boob blah"

print(get_edits_string(one_string, other_string))

enter image description here

尝试基于Minimum Edit Distance的解决方案,在本例中,我使用this algorithm来计算距离矩阵。在那之后,矩阵上的迭代返回到前进,以确定字符串中包含或删除的字符,因为我需要反转结果

要给终端上色,我使用colorama模块

#!/bin/python

import sys
from colorama import *
from numpy import zeros

init()

inv_WHITE = Fore.WHITE[::-1]
inv_RED = Fore.RED[::-1]
inv_GREEN = Fore.GREEN[::-1]

def edDistDp(y, x):
        res = inv_WHITE
        D = zeros((len(x)+1, len(y)+1), dtype=int)
        D[0, 1:] = range(1, len(y)+1)
        D[1:, 0] = range(1, len(x)+1)
        for i in xrange(1, len(x)+1):
                for j in xrange(1, len(y)+1):
                        delt = 1 if x[i-1] != y[j-1] else 0
                        D[i, j] = min(D[i-1, j-1]+delt, D[i-1, j]+1, D[i, j-1]+1)
        #print D

        # iterate the matrix's values from back to forward
        i = len(x)
        j = len(y)
        while i > 0 and j > 0:
                diagonal = D[i-1, j-1]
                upper = D[i, j-1]
                left = D[i-1, j]

                # check back direction
                direction = "\\" if diagonal <= upper and diagonal <= left else "<-" if left < diagonal and left <= upper else "^"
                #print "(",i,j,")",diagonal, upper, left, direction
                i = i-1 if direction == "<-" or direction == "\\" else i
                j = j-1 if direction == "^" or direction == "\\" else j
                # Colorize caracters
                if (direction == "\\"):
                        if D[i+1, j+1] == diagonal:
                                res += x[i] + inv_WHITE
                        elif D[i+1, j+1] > diagonal:
                                res += y[j] + inv_RED
                                res += x[i] + inv_GREEN
                        else:
                                res += x[i] + inv_GREEN
                                res += y[j] + inv_RED
                elif (direction == "<-"):
                        res += x[i] + inv_GREEN
                elif (direction == "^"):
                        res += y[j] + inv_RED
        return res[::-1]

one_string =   "beep boop"
other_string = "beep boob blah"
print ("'%s'-'%s'='%s'" % (one_string, other_string, edDistDp(one_string, other_string)))
print ("'%s'-'%s'='%s'" % (other_string, one_string, edDistDp(other_string, one_string)))

other_string = "hola nacho"
one_string =   "hola naco"
print ("'%s'-'%s'='%s'" % (one_string, other_string, edDistDp(one_string, other_string)))
print ("'%s'-'%s'='%s'" % (other_string, one_string, edDistDp(other_string, one_string)))

您可以使用ndiff

例如

import difflib

cases=[('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),
       ('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),
       ('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),
       ('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),
       ('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),
       ('abcdefg','xac')] 

for a,b in cases:     
    print('{} => {}'.format(a,b))  
    for i,s in enumerate(difflib.ndiff(a, b)):
        if s[0]==' ': continue
        elif s[0]=='-':
            print(u'Delete "{}" from position {}'.format(s[-1],i))
        elif s[0]=='+':
            print(u'Add "{}" to position {}'.format(s[-1],i))    
    print()      

返回

afrykanerskojęzyczny => afrykanerskojęzycznym
Add "m" to position 20

afrykanerskojęzyczni => nieafrykanerskojęzyczni
Add "n" to position 0
Add "i" to position 1
Add "e" to position 2

afrykanerskojęzycznym => afrykanerskojęzyczny
Delete "m" from position 20

nieafrykanerskojęzyczni => afrykanerskojęzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2

nieafrynerskojęzyczni => afrykanerskojzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
Add "k" to position 7
Add "a" to position 8
Delete "ę" from position 16

abcdefg => xac
Add "x" to position 0
Delete "b" from position 2
Delete "d" from position 4
Delete "e" from position 5
Delete "f" from position 6
Delete "g" from position 7

有关更多信息,请参阅本文

Python - difference between two strings

相关问题 更多 >