<h2>最新答复:</h2>
<p>正如OP提到的,只有一条记录的文本文件,以下解决方案是合适的:</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
import re
from os import sep, getcwd
from path import glob, Path
from collections import OrderedDict
def oneFileSingleRecordParser(textFilePath):
fileName = textFilePath.rsplit(sep, 1)[-1]
with open(textFilePath, "r") as textFile:
# The structure is:
# Yield:
# Timestamp
# Angle
# ErrorCode 10
# ErrorCode 12
# ErrorCode 16
# ErrorCode 20
# The error codes can be present or absent
lines = textFile.readlines()
dataDict = OrderedDict()
dataDict["File Name"] = fileName
for line in lines:
matchObject = re.match(r"(\w+\s?\d*):\s(.*)", line.strip())
if matchObject is not None:
key, value = matchObject.groups()
dataDict[key] = value
return dict(dataDict)
def convertAllFilesToDataFrame(textFilePathsRoot, parser = oneFileSingleRecordParser):
if not os.path.isdir(textFilePathsRoot):
raise Exception("Please pass in a valid path to the root of the text files")
textFilePaths = list(map(lambda path: str(path), Path(textFilePathsRoot).glob("*.txt")))
dataDicts = []
for textFilePath in textFilePaths:
dataDicts.append(parser(textFilePath))
dataFrame = pd.DataFrame(dataDicts)
return dataFrame
</code></pre>
<p><code>convertAllFilesToDataFrame("path/to/your/text/file/directory")</code>仍应产生以下输出(在我的情况下,我只有两个具有完全相同记录的文件):</p>
<p><a href="https://i.stack.imgur.com/cvqU6.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/cvqU6.png" alt="enter image description here"/></a></p>
<h2>原始答案</h2>
<p>根据文本文件的结构,可以通过两种方式解决此问题:</p>
<ul>
<li>一个文本文件正好包含五行(一条记录)</li>
<li>单个文本文件可能包含5行的倍数(多条记录)</li>
</ul>
<p>以下是我的应对策略:</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
import re
from os import sep, getcwd
from path import glob, Path
from collections import OrderedDict
def oneFileSingleRecordParser(textFilePath):
fileName = textFilePath.rsplit(sep, 1)[-1]
with open(textFilePath, "r") as textFile:
# The structure is:
# Yield:
# Timestamp
# Angle
# ErrorCode 10
# ErrorCode 12
lines = textFile.readlines()
if len(lines) != 5:
raise Exception("The file at {} doesn't have a proper single record.".format(textFilePath))
dataDict = OrderedDict()
dataDict["File Name"] = fileName
for line in lines:
# regex to extract the key and value name
matchObject = re.match(r"(\w+\s?\d*):\s(.*)", line.strip())
if matchObject is not None:
key, value = matchObject.groups()
dataDict[key] = value
return dict(dataDict)
def oneFileMultiRecordParser(textFilePath):
fileName = textFilePath.rsplit(sep, 1)[-1]
with open(textFilePath, "r") as textFile:
# The structure is:
# Yield_1:
# Timestamp_1:
# Angle_1:
# ErrorCode 10_1:
# ErrorCode 12_1:
# Yield_2:
# Timestamp_2:
# Angle_2:
# ErrorCode 10_2:
# ErrorCode 12_2:
# ...
lines = textFile.readlines()
if len(lines) % 5 != 0:
raise Exception("The file at {} doesn't have a uniform structure.".format(textFilePath))
records = []
dataDict = OrderedDict()
dataDict["File Name"] = fileName
for index, line in enumerate(lines):
# regex to extract the key and value name
matchObject = re.match(r"(\w+\s?\d*):\s(.*)", line.strip())
if matchObject is not None:
key, value = matchObject.groups()
dataDict[key] = value
else:
raise Exception("Line={}, content=\"{}\" has some formatting issues, regex failed".format(index + 1, line))
if (index + 1) % 5 == 0:
records.append(dataDict)
dataDict = OrderedDict() # reset for next iteration
dataDict["File Name"] = fileName
return records
def convertAllFilesToDataFrame(
parser = oneFileSingleRecordParser,
validParserNames = ("oneFileSingleRecordParser", "oneFileMultiRecordParser",)
):
if not parser.__name__ in validParserNames:
raise Exception("Proper parser was not used")
pathToFiles = getcwd()
textFilePaths = list(map(lambda path: str(path), Path(pathToFiles).glob("*.txt")))
dataDicts = []
for textFilePath in textFilePaths:
if parser.__name__ == validParserNames[0]:
dataDicts.append(parser(textFilePath))
elif parser.__name__ == validParserNames[1]:
dataDicts.extend(parser(textFilePath))
dataFrame = pd.DataFrame(dataDicts)
return dataFrame
</code></pre>
<p><code>convertAllFilesToDataFrame(parser = oneFileMultiRecordParser)</code>将产生:
<a href="https://i.stack.imgur.com/9FqBU.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/9FqBU.png" alt="enter image description here"/></a></p>
<p><code>convertAllFilesToDataFrame(parser = oneFileSingleRecordParser)</code>将产生:
<a href="https://i.stack.imgur.com/cvqU6.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/cvqU6.png" alt="enter image description here"/></a></p>
<p>代码并不完全枯燥,但您可能需要更多的时间来完成</p>