Python：不同的（excel）文件名，相同的内容ch

>> f1 = 'f1.xlsx' >> f2 = 'f2.xlsx' #Using read(): >>> open(f1).read()==open(f2).read() False #Using filecmp.cmp: >>> filecmp.cmp(f1, f2, shallow=True) False #Using izip: >>> all(line1 == line2 for line1, line2 in izip_longest(f1, f2)) False #Using hash: >>> hash1=hashlib.md5() >>> hash1.update(f1) >>> hash1 = hash1.hexdigest() >>> hash2=hashlib.md5() >>> hash2.update(f2) >>> hash2 = hash2.hexdigest() >>> hash1==hash2 False #also note, using getsize: >>> os.path.getsize(f1) 8007 >>> os.path.getsize(f2) 8031

1条回答

网友

1楼 · 发布于 2024-05-04 21:01:52

我过去也遇到过同样的问题，最后只是做了一些“逐行”的比较。对于excel文件，我使用openpyxl模块，它有一个很好的接口，可以逐个单元格地挖掘文件。对于docx，我使用python_docx模块。以下代码适用于我：

>>> from openpyxl import load_workbook
>>> from docx import Document

>>> f1 = Document('testDoc.docx')
>>> f2 = Document('testDoc.docx')
>>> wb1 = load_workbook('testBook.xlsx')
>>> wb2 = load_workbook('testBook.xlsx')
>>> s1 = wb1.get_active_sheet()
>>> s2 = wb2.get_active_sheet()

>>> def comp_xl(s1, s2):
>>>    for row1, row2 in zip(s1.rows, s2.rows):
>>>         for cell_1, cell_2 in zip(row1, row2):
>>>             if isinstance(cell_1, openpyxl.cell.cell.MergedCell):
>>>                 continue
>>>             elif not cell_1.value == cell_2.value:
>>>                 return False
>>>    return True

>>> comp_xl(s1, s2)
True
>>> all(cell_1.value==cell_2.value for cell_1, cell_2 in zip((row for row in s1.rows), (row for row in s2.rows)) if isinstance(cell_1, openpyxl.cell.cell.Cell)) 
True

>>> def comp_docx(f1, f2):
>>>     p1 = f1.paragraphs
>>>     p2 = f2.paragraphs
>>>     for i in range(len(p1)):
>>>         if p1[i].text == p2[i].text:
>>>             continue
>>>         else: return False
>>>     return True

>>> comp_docx(f1, f2)
True
>>> all(line1.text == line2.text for line1, line2 in zip(f1.paragraphs, f2.paragraphs))
True

它是非常基本的，显然没有考虑样式或格式，但只是为了测试两个文件的文本内容。希望这对某人有帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章