| Text1 | Text2 | Change |
|:-------------------------------|:------:-----------------| -----:---------|
| This is Mango. This is Banana | This is Banana | This is Mango. |
| This is Mango. | This is Mango, Banana | , Banana |
希望如上所述从Text1和Text2派生Change列。上面一个是excel数据/数据框
下面的代码可以很好地处理文本,但不能处理数据帧
import difflib
定义原始文本 摘自:https://en.wikipedia.org/wiki/Internet_Information_Services
original = ["About the IIS", "", "IIS 8.5 has several improvements related", "to performance in large-scale scenarios, such", "as those used by commercial hosting providers and Microsoft's", "own cloud offerings."]
定义修改的文本
edited = ["About the IIS", "", "It has several improvements related", "to performance in large-scale scenarios."]
启动不同的对象
d = difflib.Differ()
计算两个文本之间的差异
diff = d.compare(original, edited)
输出结果
print ('\n'.join(diff))
=>;输出如下
python comparing-strings-difflib.py
About the IIS
- IIS 8.5 has several improvements related
? ^^^^^^
+ It has several improvements related
? ^
- to performance in large-scale scenarios, such
? ^^^^^^
+ to performance in large-scale scenarios.
?
创建
diff
是为了比较字符串(特别是source code
)你应该使用
在每行上运行自己的函数并比较行中的两个文本
但是使用
Diff()
来获取更改是没有用的,因为它以文本形式给出结果。您应该使用
SequenceMatcher
将其作为元组获取要创建
DataFrame
,我必须在列Text2
中添加缺少的行,并使用None
识别缺少的行如果使用空字符串
""
,则不需要部分if text2 is None:
最小工作代码
结果:
编辑:
空字符串也一样
None
结果:(
delete
而不是remove
)相关问题 更多 >
编程相关推荐