如何将.txt文件解析为.xml文件？

In File Name: C:\Users\naqushab\desktop\files\File 1.m1 Out File Name: C:\Users\naqushab\desktop\files\Output\File 1.m2 In File Size: Low: 22636 High: 0 Total Process time: 1.859000 Out File Size: Low: 77619 High: 0 In File Name: C:\Users\naqushab\desktop\files\File 2.m1 Out File Name: C:\Users\naqushab\desktop\files\Output\File 2.m2 In File Size: Low: 20673 High: 0 Total Process time: 3.094000 Out File Size: Low: 94485 High: 0 In File Name: C:\Users\naqushab\desktop\files\File 3.m1 Out File Name: C:\Users\naqushab\desktop\files\Output\File 3.m2 In File Size: Low: 66859 High: 0 Total Process time: 3.516000 Out File Size: Low: 217268 High: 0

import re import xml.etree.ElementTree as ET rex = re.compile(r'''(?P<title>In File Name: |Out File Name: |In File Size: Low: |Total Process time: |Out File Size: Low: ) (?P<value>.*) ''', re.VERBOSE) root = ET.Element('root') root.text = '\n' # newline before the celldata element with open('Performance.txt') as f: celldata = ET.SubElement(root, 'filedata') celldata.text = '\n' # newline before the collected element celldata.tail = '\n\n' # empty line after the celldata element for line in f: # Empty line starts new celldata element (hack style, uggly) if line.isspace(): celldata = ET.SubElement(root, 'filedata') celldata.text = '\n' celldata.tail = '\n\n' # If the line contains the wanted data, process it. m = rex.search(line) if m: # Fix some problems with the title as it will be used # as the tag name. title = m.group('title') title = title.replace('&', '') title = title.replace(' ', '') e = ET.SubElement(celldata, title.lower()) e.text = m.group('value') e.tail = '\n' # Display for debugging ET.dump(root) # Include the root element to the tree and write the tree # to the file. tree = ET.ElementTree(root) tree.write('Performance.xml', encoding='utf-8', xml_declaration=True)

2条回答

网友

1楼 · 编辑于 2024-10-01 11:30:43

从文件中看（重点是我的）：

re.VERBOSE
This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.

正则表达式中的转义空格或使用\s类

网友

2楼 · 编辑于 2024-10-01 11:30:43

对正则表达式的更正：应该是

m = re.search('(?P<title>(In File Name)|(Out File Name)|(In File Size: *Low)|(Total Process time)|(Out File Size: *Low)):(?P<value>.*)',line)

而不是你所给予的。因为在regex中，In File Name|Out File Name意味着，它将检查In File Nam，但e或{}后跟{}等等。在

建议

你可以不使用正则表达式。 xml.dom.minidom可用于修饰xml字符串。在

为了更好地理解，我添加了注释！在

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])
Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to

编辑

import itertools as it
[line[0] for line in it.groupby(lines)]
you can use groupby of itertools package to group consucutive dedup in list lines

所以

^{pr2}$

输出： 性能.xml

<?xml version="1.0" encoding="utf-8"?>
<root>
 <filedata>
  <InFileName>File 1.m1</InFileName>
  <OutFileName>File 1.m2</OutFileName>
  <InFileSize>22636</InFileSize>
  <TotalProcesstime>1.859000</TotalProcesstime>
  <OutFileSize>77619</OutFileSize>
 </filedata>
 <filedata>
  <InFileName>File 2.m1</InFileName>
  <OutFileName>File 2.m2</OutFileName>
  <InFileSize>20673</InFileSize>
  <TotalProcesstime>3.094000</TotalProcesstime>
  <OutFileSize>94485</OutFileSize>
 </filedata>
 <filedata>
  <InFileName>File 3.m1</InFileName>
  <OutFileName>File 3.m2</OutFileName>
  <InFileSize>66859</InFileSize>
  <TotalProcesstime>3.516000</TotalProcesstime>
  <OutFileSize>217268</OutFileSize>
 </filedata>
</root>

希望有帮助！在

相关问题更多 >

编程相关推荐

热门问题

热门文章