比较有序列表、XML和CSV应用程序
ListComparator的Python项目详细描述
内容
Detailed Documentation
XML and CSV comparisons
提供了两个脚本:xml-cmp和csv-cmp 它们都比较两个文件并输出delta作为文件suppr, 文件加载项和文件更改
扩展分别被强制为xml或csv
List comparison
ListComparator提供了一个Comparator对象,该对象允许查找差异 在两个列表之间,前提是列表的元素以相同的顺序出现
>>> old = [1, 2, 3, 4, 5, 6] >>> new = [1, 3, 4, 7, 6]
>>> from listcomparator.comparator import Comparator
让我们创建一个比较器对象
>>> comp = Comparator(old,new)
check方法为additions和deletions属性提供值
>>> comp.check() >>> comp.additions [7] >>> comp.deletions [2, 5]
我们还可以使用列表列表
>>> old_list = [['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']] >>> new_list = [['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']] >>> comp = Comparator(old_list, new_list) >>> comp.check() >>> comp.additions [['1234', 'qwertw'], ['4865', 'lorem']] >>> comp.deletions [['1234', 'qwerty'], ['9876', 'ipsum']]
我们可以有一个问题,当一个修改,在我们的情况下“qwerty”变成“qwertz”, 出现在两个输出中,comp.additions和comp.deletions。 你可能会认为这是一个改变。 比较器可以处理这个问题,如果您提供一个函数 告诉比较器如何识别这种情况 在我们的示例中,如果 列表是相同的,一种ID。
>>> def my_key(x): ... return x[0] ...
然后getchanges方法提供一个新属性:changes
>>> comp.getChanges(my_key) >>> comp.changes [['1234', 'qwertw']]
当然,添加和删除保持不变
>>> comp.additions [['1234', 'qwertw'], ['4865', 'lorem']] >>> comp.deletions [['1234', 'qwerty'], ['9876', 'ipsum']]
您可能只想考虑“纯”的添加和删除 getchanges允许关键字参数'purge'这样做
>>> comp.getChanges(my_key, purge=True) >>> comp.changes [['1234', 'qwertw']] >>> comp.additions [['4865', 'lorem']] >>> comp.deletions [['9876', 'ipsum']]
新旧属性存储要比较的列表 您可能想重置这些,comparator提供了一个purgeoldnew方法 清除内存
>>> comp.old [['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']] >>> comp.new [['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']] >>> comp.purgeOldNew() >>> comp.old >>> comp.new
compare XML files
比较器可用于比较XML文件 让我们制作两个描述书籍的xml文件
>>> old='''<?xml version="1.0" ?> ... <infos> ... <book><title>White pages 1995</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Paris</title> ... <para>ABEL Antoine 82 23 44 12</para> ... <para>ABEL Pierre 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Yellow pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Bretagne</title> ... <para>Zindep 82 23 44 12</para> ... <para>ZYM 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Dark pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Greves</title> ... <para>SNCF 82 23 44 12</para> ... </chapter> ... </book> ... </infos> ... '''
>>> new='''<?xml version="1.0"?> ... <infos> ... <book><title>White pages 1995</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Paris</title> ... <para>ABIL Antoine 82 23 44 12</para> ... <para>ABEL Pierre 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Yellow pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Bretagne</title> ... <para>Zindep 82 23 44 12</para> ... <para>ZYM 82 67 23 12</para> ... </chapter> ... </book> ... <book><title>Blue pages 2007</title> ... <author> ... <surname>La Poste</surname> ... </author> ... <chapter><title>Bretagne</title> ... <para>Mer 82 23 44 12</para> ... <para>Ciel 82 67 23 12</para> ... </chapter> ... </book> ... </infos> ... '''
解析XML需要elementTree
>>> from elementtree import ElementTree as ET
对于此测试,我们将使用cstringio而不是文件
>>> import cStringIO >>> ex_old = cStringIO.StringIO(old) >>> ex_new = cStringIO.StringIO(new)
我们分析内容
>>> root_old = ET.parse(ex_old).getroot() >>> root_new = ET.parse(ex_new).getroot()
“book”标签标识我们想要的对象 >>>>对象旧=根旧。findall('book') >>>>objects_new=root_new.findall('book')
由于无法比较两个对象,因此我们将它们串起来
>>> objects_old = [ET.tostring(o) for o in objects_old] >>> objects_new = [ET.tostring(o) for o in objects_new]
从这里开始,比较器可用
>>> my_comp = Comparator(objects_old, objects_new) >>> my_comp.check()
>>> for e in my_comp.additions: ... print e ... <book><title>White pages 1995</title> <author> <surname>La Poste</surname> </author> <chapter><title>Paris</title> <para>ABIL Antoine 82 23 44 12</para> <para>ABEL Pierre 82 67 23 12</para> </chapter> </book> <BLANKLINE> <book><title>Blue pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Bretagne</title> <para>Mer 82 23 44 12</para> <para>Ciel 82 67 23 12</para> </chapter> </book> <BLANKLINE>
>>> for e in my_comp.deletions: ... print e ... <book><title>White pages 1995</title> <author> <surname>La Poste</surname> </author> <chapter><title>Paris</title> <para>ABEL Antoine 82 23 44 12</para> <para>ABEL Pierre 82 67 23 12</para> </chapter> </book> <BLANKLINE> <book><title>Dark pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Greves</title> <para>SNCF 82 23 44 12</para> </chapter> </book> <BLANKLINE>
我们需要知道wich标记是用来唯一定义一个对象的 在这里,我们选择使用“title”标记
>>> def item_signature(xml_element): ... title = xml_element.find('title') ... return title.text ...
我们构建自定义函数以供比较器使用
>>> def my_key(str): ... file_like = cStringIO.StringIO(str) ... root = ET.parse(file_like) ... return item_signature(root) ...
然后比较器的getchanges方法可用
>>> my_comp.getChanges(my_key, purge=True)
哪些书被独家添加?
>>> for e in my_comp.additions: ... print e ... <book><title>Blue pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Bretagne</title> <para>Mer 82 23 44 12</para> <para>Ciel 82 67 23 12</para> </chapter> </book> <BLANKLINE>
哪些书被完全删除了?
>>> for e in my_comp.deletions: ... print e ... <book><title>Dark pages 2007</title> <author> <surname>La Poste</surname> </author> <chapter><title>Greves</title> <para>SNCF 82 23 44 12</para> </chapter> </book> <BLANKLINE>
什么书变了?即具有相同的标题,但其他值不同
>>> for e in my_comp.changes: ... print e ... <book><title>White pages 1995</title> <author> <surname>La Poste</surname> </author> <chapter><title>Paris</title> <para>ABIL Antoine 82 23 44 12</para> <para>ABEL Pierre 82 67 23 12</para> </chapter> </book> <BLANKLINE>
然后我们可以将这些结果放回xml文件中
- 此代码符合PEP8
- 它经过全面测试,100%覆盖
- buildbot在每次提交时运行测试
Contributors
主要开发人员
- nicolas laurance<;nlaurance在zindep dot com>;
with contributions of
- Yves Mahe<;Ymahe在zindep dot com>;
with contributions of
- Yves Mahe<;Ymahe在zindep dot com>;