使用Python重新生成Open Office XML --- Namesp

2024-09-25 00:22:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个Python程序,它读取Excel电子表格,修改XML,然后再次将其写出。由于各种原因,我不能轻松地使用现有的pythonxlsx修改包。你知道吗

所以这是我的问题。我有读取ZIP文件、解码XML和修改树的代码,但是当我创建新XML时,它的格式不正确。你知道吗

下面是一个演示程序,演示了我要做的事情:

SPREADSHEET_NAMESPACE = '{http://schemas.openxmlformats.org/spreadsheetml/2006/main}'
CELL  = SPREADSHEET_NAMESPACE + 'c'
VALUE = SPREADSHEET_NAMESPACE + 'v'
FORMULA  = SPREADSHEET_NAMESPACE + "f"

xml = """<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:C5" /><sheetViews><sheetView tabSelected="1" showRuler="0" zoomScale="85" workbookViewId="0"><selection activeCell="A5" sqref="A5:C5" /></sheetView></sheetViews><sheetFormatPr baseColWidth="10" defaultRowHeight="16" x14ac:dyDescent="0.2" /><sheetData><row r="1" spans="1:3" x14ac:dyDescent="0.2"><c r="A1" t="s"><v>0</v></c><c r="B1" t="s"><v>1</v></c><c r="C1" t="s"><v>2</v></c></row></sheetData><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" /></worksheet>"""

import xml.etree.ElementTree as ET
root = ET.fromstring(xml)
print(ET.dump(root))

以下是正确格式化的输入字符串:

<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" mc:Ignorable="x14ac">
  <dimension ref="A1:C5"/>
  <sheetViews>
    <sheetView tabSelected="1" showRuler="0" zoomScale="85" workbookViewId="0">
      <selection activeCell="A5" sqref="A5:C5"/>
    </sheetView>
  </sheetViews>
  <sheetFormatPr baseColWidth="10" defaultRowHeight="16" x14ac:dyDescent="0.2"/>
  <sheetData>
    <row r="1" spans="1:3" x14ac:dyDescent="0.2">
      <c r="A1" t="s">
        <v>0</v>
      </c>
      <c r="B1" t="s">
        <v>1</v>
      </c>
      <c r="C1" t="s">
        <v>2</v>
      </c>
    </row>
  </sheetData>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>

遗憾的是,输出如下(格式化):

<?xml version="1.0"?>
<ns0:worksheet xmlns:ns0="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:ns1="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:ns2="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" ns1:Ignorable="x14ac">
  <ns0:dimension ref="A1:C5"/>
  <ns0:sheetViews>
    <ns0:sheetView showRuler="0" tabSelected="1" workbookViewId="0" zoomScale="85">
      <ns0:selection activeCell="A5" sqref="A5:C5"/>
    </ns0:sheetView>
  </ns0:sheetViews>
  <ns0:sheetFormatPr baseColWidth="10" defaultRowHeight="16" ns2:dyDescent="0.2"/>
  <ns0:sheetData>
    <ns0:row r="1" spans="1:3" ns2:dyDescent="0.2">
      <ns0:c r="A1" t="s">
        <ns0:v>0</ns0:v>
      </ns0:c>
      <ns0:c r="B1" t="s">
        <ns0:v>1</ns0:v>
      </ns0:c>
      <ns0:c r="C1" t="s">
        <ns0:v>2</ns0:v>
      </ns0:c>
    </ns0:row>
  </ns0:sheetData>
  <ns0:pageMargins bottom="0.75" footer="0.3" header="0.3" left="0.7" right="0.7" top="0.75"/>
</ns0:worksheet>

我认为这是一个XML名称空间问题。但我不确定如何修复代码,使输出看起来像输入。你知道吗


Tags: orghttpa1schemasa5c5worksheetxmlns
1条回答
网友
1楼 · 发布于 2024-09-25 00:22:54

考虑改用lxml

from lxml import etree

root = etree.fromstring(xml)
print(etree.tostring(root, pretty_print=True))

见:http://lxml.de/tutorial.html#namespaces

特别是:

The ElementTree API avoids namespace prefixes wherever possible and deploys the real namespace (the URI) instead:

相关问题 更多 >